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species occupy much smaller ranges than the human species, presumably because 
reproductive barriers were favored by selection as successful primates extended 
their ranges to sufficiently different habitats. 

Unlike other mammals, humans acquire massive amounts of adaptive infor¬ 
mation culturally. Perhaps it is not coincidental that symbol-using humans of the 
late Pleistocene epoch became very widely distributed for a biological species. 
The processes modeled here, by allowing the protection of culturally transmitted 
adaptations to local conditions without genetic isolation, can be considered a 
cultural substitute for speciation. Undoubtedly many aspects of cultural trans¬ 
mission allow adaptation to a wide range of habitats. However, it does seem 
plausible that the fact that the human species is divided into distinct groups that 
are culturally isolated from each other may play a role in allowing humans to be 
culturally polymorphic and thus to occupy such a wide range of ecological niches. 
This intuition is reinforced by studies like those of Fredrik Barth, which suggest 
that contemporary ethnic groups often occupy different ecological niches. 
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This interpretation illustrates, in the context of a rather simple model, how 
adaptive modes of cultural transmission lead to outcomes that could not be pre¬ 
dicted without taking cultural processes explicitly into account. Even if one 
assumes that the criteria by which success is judged are coincident with repro¬ 
ductive success, only the properties of cultural transmission allow populations to 
adapt rapidly to a variable environment. An adaptive outcome—the differenti¬ 
ation of local groups with regard to marker traits—can be understood only in 
terms of cultural processes. We believe that this argument ought to be very 
interesting to cultural anthropologists. We have not had to leave the confines of 
adaptationist assumptions to show how the properties of culture play a fundamental 
role in human evolution. 

However, once the use of such rules as success and similarity arise, selection 
on genes underlying the capacity for culture may not be able to prevent the 
violation of adaptationist assumptions. For example, processes closely related 
to those modeled here can lead to the “runaway” evolution of marker and pref¬ 
erence traits, which have no adaptive or functional explanation (Boyd and 
Richerson, 1985, ch. 8). It is easy to imagine that the adaptive uses of cultural 
markers are common enough so that selection on genes maintains a cognitive ca¬ 
pacity to use them despite the runaway process carrying some to maladaptive 
extremes. We are convinced that complexities of this sort are a pervasive feature 
of the coevolutionary process that links genes and culture. If this idea is correct, 
any attempt to reduce the problems of human evolution to binary choices be¬ 
tween sociobiological and cultural explanations is bound to fail. The real puzzle 
is to determine how the genetic and cultural systems interact in a unified evo¬ 
lutionary process. 


NOTE 

We thank Bruce Knauft, Robert Paul, and Joan Silk for thoughtful comments on the 
first draft of this chapter. 
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Shared Norms and the 
Evolution of Ethnic Markers 

With Richard McElreath 


Unlike other primates, human populations are often divided into 
ethnic groups that have self-ascribed membership and are marked by seemingly 
arbitrary traits such as distinctive styles of dress or speech (Barth, 1969, 1981). 
The modern understanding that ethnic identities are flexible and ethnic 
boundaries porous makes the origin and existence of such groups problematic 
because the movement of people and ideas between groups will tend to atten¬ 
uate group differences. Thus, the persistence of existing boundaries and the birth 
of new ones suggests that there must be social processes that resist the ho¬ 
mogenizing effects of migration and the strategic adoption of ethnic identities. 

One recurring intuition in the social sciences is that, since ethnic markers 
signal ethnic group membership and ethnic groups are often loci of cooperation, 
markers persist because they allow people to direct altruistic behavior selectively 
toward coethnics (Van den Berghe, 1981; Nettle and Dunbar, 1997). On closer 
analysis, however, this argument turns out not to be cogent. Altruism can evolve 
only if some cue allows altruists to interact with each other preferentially so that 
they receive a disproportionate share of the benefits of altruism. One such cue 
is kinship (Hamilton, 1964), and another is previous behavior (Trivers, 1971; 
Axelrod, 1984). Another idea is that selection might favor altruists who carried 
an external, visible marker that would allow them to limit their cooperation to 
others who exhibited the marker. However, evolutionary theorists argue that 
this mechanism is unlikely to be important (Hamilton, 1964; Grafen, 1990). 
Nonaltruists with the marker do best because they get the benefit without paying 
the cost. Thus, if any process breaks up the association between the cooperator 
strategies and the markers, such individuals will rapidly proliferate and altruists 
will disappear. 
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Here we argue that markers function to allow individuals to interact with 
others who share their social norms. We present a simple mathematical model 
showing that marked groups can arise and persist if three empirically plausible 
conditions are satisfied: (1) Social behavior in groups is regulated by norms in such 
a way that interactions between individuals who share beliefs about how people 
should behave yield higher payoffs than interactions among people with discor¬ 
dant beliefs. (2) People preferentially interact with people with whom they share 
easily observable traits like dress style or dialect. (3) People imitate successful 
people, with the result that behaviors that lead to higher payoffs tend to spread. 
We also show that the preference to interact with people with markers like one’s 
own may be favored by natural selection under plausible conditions. We conclude 
by outlining several qualitative, empirically testable predictions of our model. 


A Simple Model of the Evolution of Ethnic Markers 

Consider a population divided into a number of large groups. In each time period, 
each individual interacts with another individual from the same group. People’s 
behavior in these interactions depends on culturally acquired beliefs. We will 
refer to this culturally transmitted belief as the behavioral trait. There are two 
alternative beliefs, labeled 1 and 0. Individuals’ payoffs from the social interaction 
depend on their own behavior and the behavior of their partners in the way given 
in table 7.1. This simple coordination game is meant to capture the intuition that 
many real social interactions go well if people have the same beliefs about proper 
behavior. It is likely that human societies face many problems of this kind. 
An example familiar to many of us is the one of problems in cross-cultural com¬ 
munication that result from different expectations about interactions and codes for 
communicating (Gumperz, 1982). The parameter 5 measures the strength of this 
effect. 

We also assume that it is difficult to determine another individual’s beliefs 
about proper behavior before an interaction occurs. Given the large number of 
norms and the fact that some of them will be used only a few times in one's 
lifetime (Nave, 2000), people cannot always reliably predict the behavior of 
everyone they must interact with or even predict their own behavior, since many 
such norms are unconsciously held. Much the same argument can be made for 
rules enforced by third-party punishment. A stranger who moves to a new village 


Table 7.1. Payoffs in the coordination game 



Player 2’s behavior 

Player 1 ’s behavior 

1 

0 

1 

1 + <5 

1 

0 

1 

1 + 5 

Note: Payoffs shown for player 

1: 6 is 

assumed to be 


positive. 
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cannot guess ahead of time all of the social rules that regulate behavior in his new 
home. People may be able to tell him some of the things that he needs to know, 
but it is still likely that he will make many costly social blunders, perhaps even 
run afoul of basic moral principles (field anthropologists should be familiar with 
this sort of problem). As long as people are sometimes ignorant in these ways, 
people with uncommon behaviors will be at a disadvantage, and the model 
targets these situations, not the entire scope of interaction. 

Of course, people have many traits, such as dialect, clothing style, and 
cuisine, that can be observed, and often these traits are the basis of assortative 
social interaction. To formalize this idea, we assume that there is also a readily 
observable marker trait. This trait also has variants, labeled 0 and 1, and we 
assume that individuals tend to interact with others who have the same variant 
of marker trait. The strength of this propensity is given by the parameter e. 
When e=l, individuals interact at random; when e=0, they always interact 
with someone with the same marker trait. 

There is much evidence that people who do well in life are more likely to be 
imitated (Henrich and Gil-White, 2001). To incorporate this process, we assume 
that the probability that an individual with behavior i and marker j will be imi¬ 
tated is proportional to Wy/W, where W is the average payoff in the group. This 
means that combinations of behavior and marker that lead to higher than average 
payoffs will be more likely to be imitated (see Gintis, 2000, for derivation). 

With these assumptions it is possible to derive expressions that describe how 
imitation and social interaction change the frequency of the behavior and marker 
traits in each group. The change in the fraction of the people with marker 1 
within a group, pi, is 

Api=5U{(pi-p 0 -)[l-[\-eW 2 } (1) 

where R{ = D/(LT / ) 1/2 } is the correlation of behavior and marker, U and V are 
the variances of behavior and marker, and D is the covariance between marker 
and behavior. If R = 1, everyone who has marker 1 also has behavior 1; if R = — 1, 
then everyone who has marker 1 has behavior 0, and if R = 0, the traits are 
randomly associated. Equation 1 says that if more individuals use behavior 1 than 
behavior 0, it increases; if fewer individuals use it, it decreases. The rate at which 
this occurs depends on whether the marker allows individuals to interact pref¬ 
erentially with people who have the same behavior. When R° is near 1, most 
individuals with a given behavior have the same marker, and if e is small, they 
almost always interact with individuals with the same behavior as themselves, 
and thus there is little advantage in having the common behavior. When R 2 is 
near zero, most interactions occur at random and individuals with the most 
common behavior have an advantage. 

The change in frequency of the marker 1, q\, is approximately given by 
equation (2): 

Aqi « 25D(pi - p 0 )(l — ^) (2) 

This expression is valid when the covariance between marker and behavior is 
small—when individuals’ markers predict little about their behavior. When D is 
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positive, marker 1 is associated with behavior 1, and if behavior 1 increases, so 
does marker 1. The complete expression for the change in q\ shows that this 
effect decreases as D becomes larger. 

Because the effects of social interaction and learning depend critically on the 
covariance between behavior and marker (D), we also need to know how they 
affect the covariance. Social interaction and imitation increase covariance be¬ 
tween marker and behavior when the covariance is small. The reason is simple: 
individuals with the most common combinations of behavior and marker are 
more likely to interact with others with the same behavior and thus achieve a 
higher payoff. 

We then represent population mixing due to intermarriage, relocation, and 
other factors with a migration phase that removes a proportion m of each group 
and replaces it with migrants drawn from neighboring groups. Clearly, such 
mixing will reduce the differences in the frequencies of both behavior and 
marker between neighboring groups. However, migration also has a less obvious 
and very important effect: as long as there is any difference in the frequencies of 
marker and behavior between neighboring groups, migration increases the co- 
variance between marker and behavior within groups: 

A D = m{D-D + {pi-piXqi-qi)} (3) 

where p lt q lt and D are the average frequencies of behavior and marker and the 
covariance between behavior and marker in neighboring groups that provide 
immigrants. To understand why mixing increases the covariance within groups, 
consider the case in which the frequency of marker and behavior is 0.9 in one 
group and 0.1 in a second group. Further suppose that the covariance between 
marker and behavior within both groups is zero, and therefore the marker is 
useless as a predictor of behavior. Now suppose that we mix the two groups 
completely. Most of the individuals coming into the first group will carry both 
marker and behavior 0, while those coming into the second will carry both 
marker and behavior 1. The frequency of both markers and both behaviors will 
be 0.5, but most (82%) of the individuals in the population will be either 1,1 or 
0,0, with the result that markers are now good predictors of behavior within 
groups. 

Finally, suppose that individuals sometimes acquire marker and behavior 
traits from different individuals, which leads to the randomization of behavior 
and marker—a process we term recombination. Recombination has no effect on 
the frequencies of behavior and marker, but it reduces the covariance between 
marker and behavior at a rate proportional to r. 


Simulation Results 

We have derived recursions that give the net effect of imitation, migration, and 
recombination on the frequencies of behavior and marker and the covariance 
between them. However, these recursions are too complex to solve analytically, 
and we have, therefore, relied on numerical simulation. We begin by describing 
simulations of the model when there are only two interacting populations. This 
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system provides an intuition for the processes that sometimes give rise to marked 
groups. We then explore the parameter space of the model, varying e (the 
chance of interacting at random), m (migration), 5 (the effects of social behavior 
on individual welfare), and r (the rate of recombination) to map the range of 
conditions under which marked groups arise. Finally, we generalize the model, 
allowing larger numbers of populations and a general coordination game struc¬ 
ture. These analyses suggest that the simple model is relatively robust. 

1. Stable behavioral differences between groups usually become ethnically 
marked. Social interaction alone can lead to the evolution of stable differences in 
behavior between two groups. People with more common behaviors achieve 
higher payoffs in the coordination game and are more likely to be imitated. Thus, 
if one behavior is initially common in one group and the alternative behavior is 
initially common in the other group, payoffs from social behavior coupled 
with imitation of the successful will cause the groups to become more different. 
If the diversifying effect of payoff-biased imitation is sufficiently strong com¬ 
pared with the homogenizing effect of migration, the two populations will reach 
an equilibrium at which behavior 1 is common in group 1 and behavior 0 in 
group 2. In contrast, if the rate of mixing is too high or if initially the same 
behavior is common in both populations, only one behavior will be present in 
both populations at equilibrium. 

If stable behavioral differences between groups exist, each behavior can 
become associated with a different marker variant—behavior 1 will, for example, 
be associated with marker 0 and behavior 0 with marker 1. Figure 7.1 illustrates 
this dynamic. Initially behavior 1 is more common in population 1 and less com¬ 
mon in population 2. Marker 0 is initially more common than marker 1 in both 
populations but relatively more common in population 2 than in population 1. 


Figure 7.1. The frequencies of each of 
the four combinations of behavior 
and marker over time in each of two 
populations for m = 0.025, e = 0.25, 
and r= 0.1. The behaviors are denoted 
by the shape of the symbol, circle 
(= 0) or square (=1), and the markers 
are denoted by color, black (= 0) or 
white (=1). Initially behavior 1 
(squares) has frequency 0.55 in 
population 1 and 0.45 in population 2. 
Marker 0 (black) is initially more 
common than marker 1 in both 
populations but relatively more com¬ 
mon in population 1 [qn = 0.8) than 
in population 2 (gi 2 = 0.7). 
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There is no initial covariance within populations. At first, rare-type disadvantage 
causes behavior 1 to become more common in population 1 and behavior 0 in 
population 2. At the same time, migration generates a negative covariance be¬ 
tween marker and behavior so that behavior 1 tends to co-occur with marker 0 
and marker 0 with behavior 1. This in turn strengthens the forces increasing the 
differences between the populations in frequencies of marker and behavior, 
which then generates greater covariance. This positive feedback process (figure 
7.2) continues until a symmetrical equilibrium is reached at which a different 
behavior is common in each population and each behavior is associated with a 
different marker. The adaptive behaviors have become symbolically marked, 
even though the same marker was initially common in both groups. 

However, migration and recombination oppose the positive feedback process 
described. Migration tends to make the two populations the same, equalizing 
the frequency of the markers in each population, and recombination destroys 
the covariance between marker and behavior. If recombination is strong, it dis¬ 
sipates the covariance between marker and behavior more rapidly than migra¬ 
tion and imitation can create it. Even though the payoff advantage of being in 
the majority is sufficient to maintain behavioral differences between the two 
populations, these differences do not become ethnically marked. When in¬ 
dividuals are unable to assort accurately on the basis of markers (e is large), the 
pattern is similar: stable group differences in behavior may emerge and persist, 
but selection on markers is too weak to generate covariance between marker and 
behavior. 

The qualitative arguments are supported by systematic sensitivity analysis. 
We determined the range of parameters under which groups become marked by 
performing a large number of simulations. For each simulation we calculated the 
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Figure 7.2. The feedback process that generates marked groups and the forces that 
oppose this process. 
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value of D, the population average covariance between behavior and marker, 
averaged over the 100 simulations. We held parameter values constant at 
m = 0.01, e = 0.3, r=0.01, 5 =0.5 for parameters not varied in a run of simu¬ 
lations. Figure 7.3 summarizes these results. When biased imitation can maintain 
stable behavioral differences in the face of migration, stable marker differences 
evolve provided that (1} recombination (r) is not too strong and (2) individuals 
interact sufficiently often with individuals like themselves (e is not too high). 
There are no cases in which behavioral differences fail to evolve and marker 
differences manage to become stable. 

2. Spatial structure is needed to generate ethnic markers but not to maintain 
them. Migration between groups generates the initial covariance essential for the 
evolution of ethnic markers. However, if individuals are able to use markers 
to assort accurately (ew 1), spatial structure is no longer necessary to maintain 
ethnic markers once such covariance arises (figure 7.4) and groups end up mixed 
together in space, but high covariance between markers and behaviors remains. 
This configuration can be a stable equilibrium only if r and e are very small. 
However, for somewhat larger values of r and e, there is a long transition period 
during which two ethnically marked types are present without spatial variation. 
A more complex model in which groups occupied different niches would likely 
be able to sustain spatially mixed ethnically marked groups in a wider range of 
circumstances. Also, we will demonstrate later that natural selection would re¬ 
duce values of r and e if at all possible. This makes the possibility of the evolution 
of such spatially blended systems more likely. Such situations are an interesting 
and unexpected outcome of our model. 

3. Increasing the number of populations increases the range of initial conditions 
that give rise to ethnic markers. Random starting conditions (random frequencies 
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Figure 7.3. The evolution of stable marker differences. White regions are combinations of 
parameter values that produced both stable behavioral and marker differences (that is, 
these populations became ethnically marked). Black regions are cases in which behavioral 
differences were stable but marker differences were not (that is, these populations 
became culturally different but without ethnic markers). Gray regions are cases in which 
behavioral differences failed to evolve, typically because of strong migration. 
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of behavior and marker in each group} often lead to the evolution of behaviorally 
different and marked groups, and this result becomes more likely as more groups 
are added to the system (figure 7.5}. The two-group system is most sensitive to 
starting conditions, as this case has the highest chance of randomly generating all 
groups with similar initial behavior frequencies. 



Time 



Figure 7.4. The frequencies of each the four combinations of behavior and marker 
over time in each of two populations. The behaviors are denoted by the shape of the 
symbol, circle (= 0) or square (= 1}, and the markers are denoted by color, black (= 0} 
or white (=1}. The initial conditions and value of m are the same as in figure 7.1, but now 
assortment is perfect, e = 0.0, and there is no recombination, r=0.0. As before, at first 
rare-type disadvantage causes the behavior 1 to become more common in population 1 
and behavior 0 in population 2, and migration generates a negative correlatiion between 
marker 1 and behavior 0 (equation 4}. However, because there is no recombination, 
this covariance builds up much more rapidly, especially in population 1, in which the 
initially relatively more common marker was also absolutely more common. The high 
correlation between marker and behavior combined with the accurate assortment elim¬ 
inates rare-type disadvantage, and migration mixes the two groups until they are identical. 
Because the covariance increases more rapidly in population 1, the marker-behavior 
variant in population 2 experiences a transient advantage that is preserved at equilibrium. 
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Figure 7 . 5 . Equilibrium absolute values of D (covariance in the population as a whole] 
for simulations involving two groups (top, 100 simulations] and six groups (bottom, 
100 simulations]. Starting conditions were random with parameter values m = 0.025, 
r=0.10, e = 0.30, 5 =0.50. High D becomes more likely as the number of groups 
increases. 


4. Group differences are strongest at boundaries. When more than two 
groups are arrayed in space, the correlation between marker and behavior 
(R = Dj,/ s/U^Vif] is greatest at the boundaries between culture areas. Figure 7.6 
shows the steady state in ten populations arranged in a stepping-stone ring. This 
steady state results from an initial clinal distribution of behavior and marker 
frequencies with zero correlation between behavior and marker in each popu¬ 
lation. There is a region of three populations in the middle in which the 
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Figure 7.6. The steady state that arises from slightly clinal initial distributions of the 
frequencies of marker 1 and behavior 1 in ten populations arranged in a ring. Broken line, 
pi\ heavy solid line, qp, light solid line, R. 


frequency of marker 1 and behavior 1 is low and a region of three populations 
at the edges in which these frequencies are high (remember that the popula¬ 
tions wrap around so that population 1 exchanges migrants with population 
10}. In both of these regions there is little or no correlation between marker 
and behavior. In between these regions are boundary areas in which frequencies 
are intermediate and there is substantial correlation between marker and be¬ 
havior. 

5. A more general model of social interaction leads to similar results. So far, 
we have assumed that social interaction can be modeled by a game of pure coor¬ 
dination with equal average payoffs for both equilibria. Symmetric, pure co¬ 
ordination games are very special because the basins of attraction of the two 
equilibria are the same size. To test whether our results were sensitive to this 
assumption, we ran a number of simulations in which we varied the parameters 
of the completely general two-person coordination game shown in table 7.2. 

The results indicate that the system regularly evolves toward marked, be- 
haviorally distinct groups even when there are large deviations from the perfect 
coordination structure. Thus, our results do not depend in a sensitive way on the 
perfect nature of the game structure we have chosen. This suggests that any 
stable behavioral equilibria, regardless of their relative consequences for group or 
individual welfare, may become marked. 
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Table 7.2. Payoffs in a general two-person 
game with two stable equilibria 



Player 2’s behavior 

Player 1 ’s behavior 

1 

0 

1 

1 + 5 +g 1 - h 

0 

1 

1+5 


Note: Payoffs shown for player 1; d, g, and h are 
assumed to be positive. 


Evolutionary Stability of the Parameters 

This model depends on four parameters: m, 8, r, and e. The first two formalize 
assumptions about the ecology of the evolving populations. The second pair of 
parameters represents assumptions about human psychology. The simulation 
results indicate that social interactions in which common behaviors have high 
payoff will lead to the evolution of ethnic markers if both e and r are small, or, in 
other words, if people have a psychology that predisposes them to interact with 
individuals with the same marker as themselves and to acquire some markers and 
behaviors as a package. Natural selection will, all other things being equal, fa¬ 
vor such a psychology (that is, selection will favor mutations that reduce the 
values of e and r). However, selection on other aspects of social learning and 
demands on interaction may restrict the extent to which selection can reduce 
these parameters. 


Discussion 

We have argued that ethnic markers do not function to allow individuals to 
direct altruism to others like themselves because such a system cannot resist 
invasion by cheaters who signal altruistic intent but then do not deliver. In 
contrast, ethnic markers can signal one’s behavioral type when social interactions 
have a coordination structure because in such situations there is nothing to be 
gained from cheating. Both parties in the coordination setting gain the most 
when they honestly advertise their strategy, and as a result both the behavior and 
its advertisement spread when the successful are imitated. Axtell, Epstein, and 
Young (1999) have analyzed another model that is quite different structurally 
but works for similar reasons. 

The intuition that ethnic markers and cooperation are related is not, 
however, without merit. Humans are peculiar in that we often cooperate with 
large numbers of unrelated individuals. As we have argued, the existence of 
ethnic markers alone cannot explain the scale of human cooperation. Yet we 
have shown that markers may evolve when individuals interact in a two-person 
coordination game, and we believe that any process that leads groups to occupy 
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multiple stable equilibria may produce the same result. Two of us have argued 
at length elsewhere that human cooperation results from norms enforced by 
socially created rewards and punishments (Boyd and Richerson, 1990, 1992; 
Soltis, Boyd, and Richerson, 1995; Richerson and Boyd, 1998, 1999). If pun¬ 
ishment is sufficiently costly, such systems can stabilize a very wide range of 
behavior. Then, competition between groups will lead to the spread of moral 
systems that enhance group survival, welfare, and expansion, including norms 
that lead to enhanced cooperation in economic and military activities. 

As a result, we expect that systems of moral norms, some of which create 
group-beneficial cooperation, should come to be marked by ethnic markers by 
the process described. Punishment transforms the prisoner’s dilemma structure 
of a cooperation problem into a coordination structure. The process we have 
described here can then lead to individuals selecting individuals with whom to 
cooperate on the basis of markers, but the markers themselves do not stabilize 
the cooperation. 


Corollaries and Predictions 

The goal of this kind of modeling study is to demonstrate the cogency of a 
deductive argument linking assumptions about microlevel social interactions to 
the empirically observable macrolevel social patterns that result. Accordingly, 
we conclude by describing several testable predictions of the model. 

Our analysis of the evolutionary stability of e and r makes two predictions 
about the psychological tendencies of human beings: 

1. Individuals in marked communities should prefer interaction with similarly 
marked individuals. Our analysis of the evolution of e, the rate at which in¬ 
dividuals interact at random with respect to markers, suggests that natural se¬ 
lection or an analogous process operating on cultural rules for interaction should 
reduce e to zero, if possible. Thus, to the extent that e represents a psychological 
bias toward interacting with those who look like oneself rather than the ability or 
freedom to interact with ones like oneself, we expect members of marked com¬ 
munities to prefer individuals marked like themselves, at least when it comes to 
coordination interactions. 

2. Individuals in marked communities should acquire bundles of at least some 
norm and marker traits. While the model does not suggest anything about the 
social learning of noncoordination behaviors and social markers, our analysis of 
the evolution of r, the rate of recombination of behavior and marker traits, 
predicts that, for our model to be relevant, individuals should acquire norm and 
marker traits as a bundle. They should also preserve these associations through¬ 
out substantial portions of their life spans. If this is not true, the process we 
describe here is unlikely to work. 

The model makes three clear predictions about the nature of the distribu¬ 
tions of marker traits and their relations to ethnic groups and their histories: 

1. Ethnic differences should be stronger at boundary regions than deep within 
ethnic territories. Hodder (1977) suggests that this is true for some ethno- 
archeological data from the Lake Baringo region of Kenya, but the data are 



130 ETHNIC GROUPS AND MARKERS 


inadequate to test this prediction. The appropriate test would be examination of 
a large ethnic group, such as the Kikuyu of Kenya, which interacted at many bor¬ 
der areas with a number of different ethnic groups. Another setting that holds 
promise for testing this prediction is fragmentary migration that brings smaller 
units of a larger ethnic population into contact with other ethnic groups. If these 
groups are on average more marked than their source populations, we may be 
able to conclude that interaction with the other ethnic groups has increased 
selection on markers and magnified initial differences in those settings. 

2. Norm and marker boundaries should coincide, while the distributions of other 
culture items may map onto one another differently. Our model makes no predic¬ 
tions about the nature of all cultural traits and the distribution of ethnic markers. 
However, if this model is correct, a number of norm differences—on beliefs in 
inheritance, child rearing, household labor, and other categories of human life in 
which there are multiple coordinated solutions to the same problem—should 
correspond to the distributions of marker differences. 

3. Potential marker traits with the greatest initial differences shottld become 
marked first. One test of this prediction would be to examine ethnographic 
settings in which two isolated source populations have contributed migrant 
groups that have since been in contact for some time. The source populations 
provide estimates of the initial differences in the migrant groups when they came 
into contact. The migrant groups provide estimates of the differences that might 
have grown from those initial differences. This prediction will earn support if the 
traits with greater differences between source populations appear to have led to 
marked traits in the contact groups. 


NOTE 

Supplementary material appears in the electronic edition of Current Anthropology 
44 (2003) on the journal's web page (http://www.journals.uchicago.edu/CA/ 
home.html). 
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PART 3 

Human Cooperation, Reciprocity, 
and Croup Selection 


A number of years ago the Cambridge paleoanthropologist Rob 
Foley published a book on the evolutionary ecology of early hominins entitled 
Another Unique Species. The title was meant to capture the idea that while 
humans are unique in many ways, so too is every other species. We like the 
book very much, but perhaps the title is a bit misleading. Humans are, if you 
will allow us, “more unique” than any other primate. We are extreme outliers 
in our use of tools, in our ecological and geographical range, in the richness of 
our communication system, and so on and on. Perhaps the most singular 
feature of Homo sapiens is the scale on which humans cooperate. In most other 
species of mammals cooperation is limited to close relatives and (maybe) 
small groups of reciprocators. After weaning most individuals acquire 
virtually all of the food that they eat. There is little division of labor, no trade, 
and no large-scale conflict. Amend Hobbes to account for nepotism, and 
his picture of the state of nature is not so far off for other mammals. In 
contrast, people in even the simplest human societies regularly cooperate with 
many unrelated individuals. Sharing leads to substantial flows of food and 
other resources among different age and sex classes. Division of labor and 
trade are prominent features of every historically known human society, 
and archaeology indicates that such trade has a long history. Violent conflict 
among groups is also quite common. Since the development of agriculture 
10,000 years ago, the scale of human cooperation has steadily increased so that 
most people on earth today are enmeshed in immense cooperative institutions 
like universities, business firms, religious groups, and nation states. Moreover, 
experimental work, both in psychology and economics, indicates that people 
have social preferences that incline them to such cooperation (see Fehr and 
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Fischbacher, 2003, for a review). In the laboratory, people behave altruisti¬ 
cally in anonymous one-shot interactions, sometimes for very large stakes. 

Thus, we have an evolutionary puzzle. At some time in the not so distant 
past, say 5 million years ago, our ancestors lived in small kin-based societies 
like other apes. Then, sometime between then and now, human psychology 
changed in such a way that large-scale cooperation became common. What 
were the evolutionary processes that gave rise to this change? 

Ever since we started thinking about cultural evolution, we have thought 
that culture might provide the solution to this puzzle because it seems to 
generate lots of variation in social behavior among social groups. In other 
primate species there is little heritable variation among groups within a species. 
The behavior of groups depends on the habitat and ecology, the demo¬ 
graphic structure, and the personalities of particular individuals. But these 
differences are small and ephemeral, and, as a consequence, group selection at 
the level of whole primate groups is not an important evolutionary force. In 
contrast, it is an empirical fact that there is much heritable cultural variation 
among human groups. Neighboring groups often have different languages, 
marriage systems, and property rights, and these differences persist for 
generations. This suggested to us that group selection might be a more 
important process shaping human behavior than the behavior of other animals. 
We have devoted quite a bit of our research effort to trying to gain a clearer 
understanding of this puzzle. This work is usefully divided into two parts. 

Studies of cultural group selection. First, we have studied models of cultural 
group selection and attempted to collect empirical data necessary to deter¬ 
mine whether the models are close to reality. We believe that the case for 
cultural group selection is strong. 

Studies of the evolution of contingent cooperation. Many scholars in the 
evolutionary social science community believe that human cooperation is 
better explained by selection within groups that favored various forms 
of contingent cooperation. The idea is that during most of our evolutionary 
history, humans lived in small groups in which reciprocity and moralistic 
punishment supported cooperation. The psychological machinery that sup¬ 
ported these behaviors “misfires” in the larger societies of the last 10,000 
years. We have been skeptical about this argument because many other 
mammals live in small social groups, yet none of them shows very much 
evidence of contingent cooperation beyond pairwise reciprocity. It seemed 
to us that the advantages created by wider cooperation within groups like 
specialization, division of labor, risk spreading, and so on are huge, and 
lineages like ants and termites in which kin selection supports cooperation 
have been extremely successful. Thus, it seemed to us that if contingent 
cooperation could generate larger-scale cooperation, there ought to be lots of 
examples in nature. However, when we started thinking about this problem 
in the early 1980s, there was lots of work on the evolutionary theory of 
reciprocity among pairs of individuals, but very little about contingent 
cooperation in larger groups. So we undertook to develop theory in this area, 
and the results are reprinted here. 
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Studies of the Evolution of Contingent Cooperation 

The modern theory of the evolution of reciprocity began in 1971 when 
Robert Trivers showed that contingent cooperation could be evolutionarily 
stable. His model goes roughly as follows: suppose that pairs of individuals 
interact repeatedly over time and that occasionally one member of a pair has 
the opportunity to provide a benefit, b, to the other at a cost, c, to itself. Now 
consider a population of reciprocators who help on the first interaction and 
keep helping as long as their partner helps. Trivers (apparently with help 
from W. D. Hamilton) showed that reciprocators can resist invasion by rare 
defectors who never help as long as the long-run benefit of mutual cooperation 
is greater than the short-run benefit that a defector gets by exploiting a co- 
operator. (Or, more formally, when tfb — c) > b, where t is the average number 
of helping opportunities for each pair of individuals.) This article has been 
widely cited and was the impetus for much empirical work on reciprocity. 

However, there is a big problem with this analysis: when individuals 
interact repeatedly, reciprocity is evolutionarily stable, but so is everything 
else. Unbeknownst to Trivers and most other biologists working on reci¬ 
procity, game theorists in economics, political science, and mathematics had 
been working on the closely related problem of rational behavior in repeated 
games. As Trivers noted in his article, his model of reciprocity can be for¬ 
malized as a repeated version of the famous prisoner’s dilemma game. What 
Trivers apparently did not know is that by the late 1950s game theorists 
had proved that in a repeated prisoner’s dilemma (or, in fact, in any repeated 
game in which players can strongly affect each others’ payoffs) any pattern 
of behavior can be sustained by mutual self-interest, all cooperation for sure, 
but also all defection, or anything in between as long as interactions go on long 
enough. This important result was known as the “folk theorem” because 
nobody in the game theory community was exactly sure who first proved it, 
and though the theorem was widely known in that community, it wasn’t 
actually published until 1986 (Fundenburg and Maskin, 1986). The basic logic 
of the folk theorem is simple. Suppose a strategy takes the form: do x, where 
x is some behavior, say alternating cooperate and defect, as long as the other 
guy does x. If the other guy does something else, defect forever. Once a 
strategy like this becomes common in a population, the only smart thing to do 
is %; otherwise, one will be punished by defection for the rest of the interac¬ 
tion. If interactions go on long enough, the costs of such punishment will 
exceed the short-run benefits of doing something other than x. Repeated 
interactions create the possibility of sanctions and any behavior that enough 
sanctioners are willing to sanction is an equilibrium. For the most part, the 
logic of the folk theorem applies to evolutionary theory, although a subtle and 
important difference affects the stability of punishment. We will return to 
this issue. The bottom line is that when everything is an equilibrium showing 
that reciprocity is an equilibrium too doesn’t really tell you much. We need 
to know which equilibria are likely evolutionary outcomes and which are not. 
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In 1981 Robert Axelrod and W. D. Hamilton published an article in 
Science that showed that reciprocating strategies were, in fact, the most likely 
evolutionary outcome. Standard game theory assumes that people seek to 
maximize their average payoff. In evolutionary terms, this is equivalent to as¬ 
suming that groups of interacting individuals are formed at random with re¬ 
spect to genotype. (When individuals interact at random, their actions do not 
change the relative fitness of other types in the population. Thus, all that 
matters is the effect of behavior on an individual’s own fitness.) Reciprocators, 
or, more precisely, individuals with genes that cause them to reciprocate, are 
as likely to initially interact with defectors (i.e., individuals with defector 
genes) as are other defectors. This is not a bad assumption for a large, mobile 
mammal like humans, because there is ample gene flow among social groups 
and, to a rough approximation, individuals do interact at random. However, a 
better approximation is to assume a small tendency to interact with genetically 
similar individuals. Reciprocators are slightly more likely to interact with 
other reciprocators than defectors are. Axelrod and Hamilton showed that 
even small amounts of assortative interaction allowed reciprocal strategies to 
invade when rare and stabilized them when common. The reason is easy to see. 
When strategies interact at random, and defection is common, there is no 
chance that individuals carrying rare reciprocating genes will meet. So the long- 
run benefits associated with sustained cooperation are irrelevant. Reciprocators 
get exploited, and that is that. However, when there is some assortative in¬ 
teraction, rare reciprocators do occasionally meet, and if the long-run benefits 
of cooperation are big enough, even a small amount of assortment can cause the 
average fitness of reciprocators to exceed the average fitness of defectors. To 
see the strength of this effect, suppose that b/c= 2, helping behavior that would 
be favored only among full siblings. The following table calculates the amount 
of assortment necessary to cause reciprocating strategies to increase when rare. 
At even a modest number of interactions, the threshold value is very small. 

In dyads, a little kinship and a little repeat business can generate a lot of 
cooperation. 


Expected number of interactions 1 3 7 15 49 

Threshold value of r .5 .25 .125 .0625 .02 


Axelrod and Hamilton were also concerned that reciprocating strategies 
could do well in more complex social environments in which many different 
strategies were common. They famously championed a particular reciprocating 
strategy, tit-for-tat, showing that it did well in computer tournaments against a 
wide range of strategies. Subsequent research has shown that tit-for-tat is really 
not such a good strategy if individuals make mistakes. Other reciprocating 
strategies such as “contrite tit-for-tat” (Sugden, 1986; Boyd 1989) and 
“Pavlov” (Boerlijst, Nowak, and Sigmund, 1998) are really more robust. 
Nonetheless, their basic conclusion holds true. Given quite plausible 
assumptions, reciprocating strategies can increase when rare, can continue to 
increase under a range of assumptions, and can persist when common. 
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Axelrod and Hamilton’s (1981) article, and most of the work that 
followed it, deals with reciprocity among pairs of individuals. Many authors 
interested in human behavior have assumed that the conclusions of this work 
can be extended to cooperation in larger groups (e.g., Trivers, 1971). We 
know from everyday experience that groups of people can organize contingent 
cooperation. Committees, sports teams and many similar groups work that 
way. So even though the theory applies to pairs, the general result seems to 
apply to larger groups. Several chapters included here resulted from checking 
to see if the theory of evolution of contingent cooperation applies to larger 
groups. 

In our first effort (chapter 8), we extended the Axelrod-Hamilton analysis 
to groups of people repeatedly interacting in an n person prisoner’s dilemma. 
During each interaction, individuals can cooperate producing a benefit, b/n, 
for all players including themselves at a cost, c, to themselves. Thus, if ev¬ 
eryone cooperates, they achieve a long-run payoff, t(b — c ). As in the two- 
person case, however, defectors achieve a short-term payoff, now b{n — 1)/«, 
by free-riding on the cooperative payoffs of others. We consider a family of 
reciprocating strategies that generalize tit-for-tat to larger groups. Namely, the 
strategy T) cooperates on the first interaction and on subsequent interactions if 
j of the n — 1 other individuals cooperated during the previous interaction. 
Thus, T 0 individuals always cooperate; T n _\ cooperate only if everyone else 
cooperated on the previous turn. 

The equilibrium behavior of this model is qualitatively similar to the 
two-person case. As always, defection is evolutionarily stable. Contingent 
cooperation can be evolutionarily stable, but only if reciprocating strategies do 
not tolerate defection. A population in which the strategy T„_i is common will 
resist invasion by rare mutant defectors if the long-run benefit of cooperation 
exceeds the short-term advantage of free-riding. However, none of the other 
more tolerant reciprocating strategies can resist invasion by defectors. For 
example, when T„_ 2 , the strategy that tolerates one defector in its group, is 
common, rare defectors will get the long-run benefits of cooperation with¬ 
out paying the cost and thus will increase in frequency. It turns out that 
strategies like T „_2 that tolerate a few defectors can persist in mixed stable 
equilibria with defectors, but interactions must go on for a very long time. 
Thus, like the two-person case, virtually any kind of behavior can be evolu¬ 
tionarily stable. 

Our analysis of this model indicates that as groups get bigger, reciprocity 
becomes a much less likely evolutionary outcome. Once again, suppose 
that interacting groups are formed assortatively of relatives with degree of 
relatedness r. Then rare reciprocators using the potentially evolutionarily 
stable strategy T n _\ can invade if 

( r(n — 1) + 1 )(b/n) — c + r" _1 (f — l)(b — c) >0 

v--' v-—--' 

inclusive fitness reciprocity 

The first term on the right-hand side gives the inclusive fitness of rare 
reciprocators during the first interaction. If it is positive, cooperation pays 
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even without reciprocity. The second term gives the increase in the fitness 
of reciprocators due to ongoing interactions in those groups in which 
reciprocation is sustained. As in the two-person case, this term increases 
linearly with the average number of interactions (f)—repeat business makes 
reciprocation pay. However, also notice that the second term decreases 
geometrically with group size because cooperation is sustained only in groups 
of all reciprocators. 

Strategies supporting contingent cooperation in large groups have to 
achieve two competing desiderata. To be stable when common, they must be 
intolerant of defection; to increase when rare, there must be a substantial 
chance that groups will have enough reciprocators; otherwise, they can’t be 
evolutionarily stable, as defectors will prosper. As groups get larger, this 
become geometrically more difficult. 

A number of people have suggested (e.g., Bendor and Mookerjee, 1987) 
that this analysis underestimates the problems facing reciprocity in larger 
groups because contingent cooperation in large groups will be much more 
sensitive to errors than it is in pairs. This claim is true of the particular re¬ 
ciprocal strategies we analyzed, because a single error would lead to a collapse 
of cooperation in the group. However, we do not think that it is a robust effect 
because the reciprocating strategies in large groups can be modified to deal 
with errors in much the same way that two-person strategies can. For exam¬ 
ple, the w-person version of Pavlov would use the rule cooperate if everyone or 
no one cooperated on the last turn. Then an error would create universal 
defection, which, on the subsequent interaction, would then generate uni¬ 
versal cooperation. Strategies analogous to generous tit-for-tat likely could also 
be designed to deal with errors in an M-person setting. 

Colleagues have suggested to us that the w-person prisoner’s dilemma is 
an extreme case because it assumes that noncooperators cannot be selectively 
excluded from enjoying the benefits of the cooperative act. For example, 
everybody gets the benefits of group defense whether they fight or not. 
Indeed, economists say that such goods are not “excludable.” Perhaps in many 
instances of cooperation in groups, noncooperators can be excluded. Take 
the classic example of food sharing among hunter gatherers. In most foraging 
groups, successful hunters share their catch with the rest of their group, a 
behavior sometimes explained as a reciprocal arrangement that reduces risk of 
starvation. Couldn’t earnest hunters easily exclude guys who don’t hunt? Just 
don’t give them a share of meat. Don’t we need to consider models in which 
the fruits of cooperation are at least partly excludable? Maybe, but the 
problem is a little trickier than it first appears. 

Excluding defectors is an example of a much more general phenomena. 
To prevent a defector from eating, somebody has to intervene when he 
reaches into the pot. That someone has to undertake a (perhaps) costly action 
that reduces the payoff of the defector and thus produces a benefit to the 
group as a whole. This is an example of what Trivers called “moralistic 
punishment” and applies to a much wider range of problems than excluding 
defectors from the fruits of cooperation. Even if the defectors cannot be 
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excluded, punishment can create incentives for them to cooperate. Cowards 
may get the benefits of group defense, but they may also be shunned, 
beaten, or banished. The real question is under what conditions can selection 
favor moralistic punishment? 

In chapter 9 we attempt to answer this question. The model assumes that 
individuals interact repeatedly in an n -person prisoner’s dilemma. After each 
interaction, members have the opportunity to punish any other member of 
the group at a cost to themselves. We analyzed a variety of strategies, but here 
we begin by focusing on just two of them: moralistic punishers cooperate and 
punish defectors, and reluctant cooperators defect until they are punished, and 
then they cooperate. So that punishment could induce cooperation, we 
assume that the cost of being punished is greater than the cost of cooperating. 
Both types occasionally make mistakes and defect when they mean to 
cooperate. In this simple world, there are three types of stable equilibria. First, 
suppose reluctant cooperators are common in the population. They neither 
cooperate nor punish, so they achieve a payoff of zero. Rare mutant punishers 
will punish the n — 1 reluctant cooperators in their group and thereby induce 
them to cooperate over the long run. If the long-run benefit of being in a co¬ 
operative group is less than the one-time cost of punishing, reluctant coop¬ 
erators are an ESS. However, if the long-run benefit is greater than the cost of 
punishing, moralistic punishment can invade even when groups are formed 
at random. The fact that the reluctant cooperators do better than the mor¬ 
alistic punishers in their group is unimportant when moralistic punishers are 
rare because the vast majority of reluctant cooperators are in groups without a 
punisher. As moralistic punishers increase in frequency, however, more and 
more reluctant cooperators find themselves in groups with a punisher, and as a 
consequence their relative fitness increases. Eventually the fitness of the two 
types equalizes at a stable polymorphic equilibrium at which the population 
is a mix of cooperative and noncooperative groups. At this equilibrium, 
cooperation arises as a consequence of private individual benefit. We jokingly 
referred to this as the “big man” equilibrium after the famous political/ 
economic system common in New Guinea that it resembles. This model also 
has a second, quite different kind of equilibrium. Suppose that moralistic 
punishers are common. Now rare reluctant cooperators are always punished 
by every other member of their group during the first interaction, and as long 
as the cost of this punishment is less than the cost to moralistic punishers 
of punishing the occasional error, then punishment can sustain cooperation. 
However, it can also stabilize almost any other behavior. The long-run benefits 
of cooperation are irrelevant to the stability of this equilibrium. This is the folk 
theorem again. If almost everybody is going to punish individuals for some 
transgression then individuals must do what they want, no matter how foolish 
it is in any other terms. 

We think these two very simple models capture a robust difference in 
contingent cooperation and moralistic punishment. Contingent cooperation 
strategies can be stable only if they insist that everyone in the group cooper¬ 
ate—otherwise, they can be exploited. However, since such strategies increase 
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when rare with the greatest difficulty, they are not very likely evolutionary 
outcomes. Defecting equilibria are much more likely evolutionary outcomes. 
The directed punishment of moralistic strategies means that a small number of 
punishers can induce others to cooperate and thus achieve the long-run 
benefits of cooperation. If punishment is cheap enough that a single individual 
can induce all other group members to cooperate, then moralistic strategies 
can increase when rare. However, they can never spread to fixation precisely 
because only a few punishers are necessary, and as punishers become common, 
selection favors free riders who accept the benefits but don’t do the police 
work necessary to generate them. We are quite doubtful that this kind of 
equilibrium is common in human groups. As Hobbes pointed out long ago, 
individual men have a similar capacity for inflicting harm. When I push you 
away from the food, you are likely to push back (weapons probably reduce 
differences in fighting ability—God created men, but Sam Colt made them 
equal, frontiersmen quipped). This problem does not afflict moralistic 
equilibria because defectors are rare and punishers are common. However, 
while moralistic punishment is stable, within-group evolutionary processes do 
not make it a likely evolutionary outcome. The fact that directed punishment 
requires only a few punishers is also responsible for the peculiar nature of 
moralistic equilibria. When moralistic punishers are common, mutant non¬ 
punishers have no effect on whether the group cooperates—all groups will be 
cooperative because there are plenty of punishers everywhere. Thus, while 
such equilibria are stable, individual natural selection has no reason to attach 
such punishment to group-beneficial cooperative behaviors. 

The fact that there are always more than enough punishers at a punisher- 
cooperator equilibrium means that such equilibria can be invaded by “second- 
order free riders,” individuals who cooperate from the first interaction but 
never punish. While much of the debate about moralistic punishment has 
focused on the problem of second-order free riders, we don’t think it is 
a serious obstacle to evolution of cooperation in large groups. First of all, 
“metapunishment” can evolve, the punishment of nonpunishers. As we show 
in chapter 9, this can stabilize punishment. Many people believe that meta¬ 
punishment doesn’t actually occur in real human societies. However, even if 
this is the case, other solutions to the second-order free rider problem are 
possible. If moralistic punishment is common, and punishments sufficiently 
severe, then cooperating will pay. As a result, most people may go through 
life without having to punish very much. On average, having a predisposition 
to punish may be cheap compared to a disposition to cooperate (in the 
absence of punishment). Thus, relatively weak evolutionary forces can 
maintain a moralistic predisposition. This argument is elaborated in chapter 10 
in which it is shown that very small amounts of conformist social learning can 
stabilize moralistic punishment against second-order free riders, and in 
chapter 13 in which we show that group selection can also stabilize punish¬ 
ment. Finally, as Eric Smith and his colleagues have pointed out (Smith 
and Bliege Bird, 2000), punishing could be used to signal hard-to-observe 
personal qualities, giving punishers a private reward in the mating game, for 
example. 
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Cultural Group Selection 

When we were graduate students during the late 1960s and early 1970s, it was 
quite common for biology texts to explain observed traits in terms of their 
benefit to the population or even the species. Reduced reproductive rates 
prevented overpopulation, and sexual reproduction maintained genetic 
variation necessary for the species to adapt. A key advance in biology over the 
last 40 years was to show that such explanations are mostly wrong. Natural 
selection does not normally lead to the evolution of traits that are for the good 
of the species, or population. With some interesting exceptions, selection 
favors traits that increase the reproductive success of individuals, or sometimes 
individual genes, and when there is a conflict between what is good for the 
individual and what is good for the species, or population, selection usually 
leads to the evolution of the trait that benefits the individual. 

Many people mistakenly believe that this means that group selection is 
never important. In the early 1970s, an eccentric engineer named George Price 
published two articles (1970, 1972) that presented a genuinely new way to 
think about evolution. Price showed that selection can be thought of as a series 
of nested levels: among genes within an individual, among individuals within 
groups, and among groups. He discovered a very powerful mathematical 
formalism, now called the “Price covariance equation,” for describing these 
processes. To keep things simple, let’s suppose that there are two levels. Then 
the change in frequency of a gene undergoing selection is given by 

Aq=V qPq + VwPw 

The first term gives the change due to selection between groups and is the 
product of the variance in frequency between groups (Kg) and the effect of 
a change in the frequency of the gene on the reproductive success of the group 
Q3 G )- This makes sense: /f G gives the effect of a change in gene frequency on 
group success, and K G measures how different groups are. The second term, 
which gives the change in frequency due to changes within groups, has a similar 
form. It is the average over all groups of the product of the variance in fre¬ 
quency among individuals within the group (Kw) and the effect of a change 
in the frequency of the gene on the relative fitness of individuals within 
groups (fiw). 

This equation makes it easy to see why selection does not lead to the 
evolution of traits that are beneficial to whole populations if there is any harm 
to individuals. A gene is beneficial to the group when increasing the frequency of 
the gene increases group fitness, or /5 G > 0. If it is costly to the individual, then 
Pw< 0- The magnitude of these two terms depends on the details of the 
particular situation—you can’t say anything in general. However, theory tells us 
that when groups are large, with even a small amount of migration among them, 
the variance between individuals (VV) will be about n times bigger than the 
variance between groups (Kg; Rogers, 1990). Unless the group benefit is on the 
order of n times the cost, selection will eliminate the group-beneficial gene. But 
when this is the case, the trait is individually beneficial averaged over all groups. 
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However, this doesn’t mean that group selection is unimportant. We have 
just seen that when groups of individuals interact over long periods of time, 
any behavior can be evolutionarily stable within groups. Moreover, multi¬ 
ple stable equilibria can also arise from the conformist tendency in social 
learning discussed in chapters, 1,5, and 11. When lots of alternative equilibria 
exist, we need a theory that tells us which equilibrium will be the long-run 
evolutionary outcome—what game theorists call the equilibrium selection 
problem. We argue in several articles that selection among groups favors the 
most group-beneficial equilibrium. To see why this is plausible, consider 
the Price equation, and suppose that there are two inherited traits; both are 
stable within groups when common, but one leads to higher rates of group 
reproduction. This means that, as before, fie > 0. Because both traits are 
favored by selection when they are common, each trait will be favored in some 
groups, so that the average value of /can be either positive or negative. 
However, as long as there is not too much migration, most of the groups will 
be near one equilibrium or the other. So the variance among groups will be 
much larger than the variance within groups, independent of group size. The 
reason for this discrepancy is simple: when traits are individually advanta¬ 
geous, selection and migration are working together to make all groups the 
same; the only process making groups different is genetic drift, which depends 
strongly on population size. When there are multiple equilibria, selection is 
driving groups toward different alternative stable equilibria, creating lots of 
stable between-group variation. Thus, selection between groups generates 
group-beneficial outcomes. 

While the Price equation makes it easy to understand the logic of selection 
at the group level, it also conceals crucial details about population structure 
and the mode of intergroup competition. Evolutionary geneticists have stud¬ 
ied a range of population structures ranging from “stepping stone” models 
in which groups exchange migrants with a small number of neighbors to 
“Wright Island” models in which all groups are connected by migration. Such 
models have incorporated two modes of intergroup competition: the group- 
beneficial trait can increase the productivity of the group so that it produces 
more emigrants, called “differential proliferation,” or it can reduce the ex¬ 
tinction rate of the group, called “differential extinction.” The basic conclu¬ 
sion of theoretical work on the evolution of altruism is that these details don’t 
matter much (e.g., Aoki, 1982; Rogers, 1990). However, when there are 
multiple equilibria, the population structure and modes of group competition 
matter a lot. In Boyd and Richerson (1990), we show that when there are 
multiple equilibria, and within-group adaptive processes (selection or selection¬ 
like biased cultural transmission) are strong, the equilibrium with the lowest 
extinction rate spreads under a wide range of conditions. Groups can be large 
and migration rates substantial. The main requirement is that habitats emptied 
by extinction are colonized by individuals drawn mostly from a single group. 
Interestingly, make this a differential proliferation model and group selection 
has no effect. The same process that preserves variation between groups 
prevents a steady trickle of immigrants from groups at the group-beneficial 
equilibrium from having much effect on groups at the other equilibrium. 
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Extinction, coupled with recolonization by a single other group, means 
that groups become crude “individuals” that reproduce their own group 
characteristics. 

We also wanted to know whether intergroup competition will lead to 
change on the right time scales to explain observed rates of cultural evolution. 
Obviously, this depends on how often groups go extinct. So, working with 
Joseph Soltis, we estimated an upper bound on the rate of cultural evolution 
by this kind of intergroup competition using ethnographic data from New 
Guinea societies. This analysis (chapter 11) indicates that intergroup com¬ 
petition leads to the evolution of group-beneficial cultural traits on 500- to 
1,000-year time scales, too slow to account for much cultural change. On the 
other hand, major change in social institutions is a slow process; witness 
the relatively slow growth in sophistication of complex societies over the past 
5,000 years. The model may apply to conservative aspects of cultural 
change. Much historic and prehistoric cultural change has a time scale of 
a millennium or more. 

Intergroup competition is not the only mechanism that can lead to the 
spread of group-beneficial cultural variants—a propensity to imitate successful 
neighbors can also lead to the spread of group-beneficial variants. Plausibly, 
people often know something about what goes on in neighboring groups. 
Now, suppose that neighboring groups are at different equilibria and that one 
of the equilibria is better, meaning that it makes people in that group better 
off. Then, behaviors could spread from groups at high payoff equilibria to 
neighboring groups at lower payoff equilibria because people imitate their 
more successful neighbors. To see whether this mechanism could actually 
work, we analyzed the model presented in chapter 12, and our results suggest 
that it can lead to the spread of group-beneficial beliefs as long as groups 
are connected to only a small number of neighboring groups (in a stepping 
stone population structure) so that the success of one group can affect 
neighbors enough to cause them to tip from one equilibrium to the other. The 
model also suggests that such spread can be rapid. Roughly speaking, it takes 
about twice as long for a group-beneficial trait to spread from one group to 
another as it does for an individually beneficial trait to spread within a 
group. This process is faster than intergroup competition because it depends 
on the rate at which individuals imitate new strategies, rather than the 
rate at which groups become extinct. 

These models suggest that the evolution of cooperative norms is a side 
effect of rapid, cumulative cultural adaptation. Adaptation by cultural evo¬ 
lution brings significant benefits, especially in the climatic chaos of the later 
Pleistocene epoch. However, it also generates lots of variation between 
groups; thus, group selection is a much more important force in human 
cultural evolution than it is in genetic evolution. We think the best evidence 
from archaeology suggests that humans first began to rely on cumulative 
cultural adaptations roughly a half million years ago. If this inference is 
correct, humans have been living in social environments shaped by group 
selection for a long time. In chapter 14 (with Joe Henrich), we argue that in 
such social environments, ordinary natural selection will favor psychological 
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mechanisms like empathy, guilt, and shame that make it more likely that in¬ 
dividuals behave prosocially. The coevolutionary response of our innate social 
instincts to the selection pressures of living in rule-bound, prosocial tribal- 
scale communities substantially reshaped our social psychology. 

In chapter 14 we argue that cultural group selection and moralistic 
punishment are both important to explaining cooperation. Cultural group 
selection will favor groups with high frequencies of moralistic punishment, 
and it helps ensure that moralistic punishment enforces functional norms. 
Moralistic punishment, as we have said, plays a considerable role in main¬ 
taining between-group variation on which cultural group selection acts. We 
believe that the tilt of the modeling results and of the empirical data distinctly 
favors what we call in this chapter the tribal social instincts hypothesis. At 
minimum, we believe that the case is sufficiently strong to lift the burden of 
proof that group selection hypotheses have labored under. 
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8 


The Evolution of Reciprocity 
in Sizable Groups 


Several lines of evidence suggest that sizable groups of people 
sometimes behave cooperatively, even in the absence of external sanctions 
against noncooperative behavior. For example, in many food foraging groups, 
game is shared among all members of the group regardless of who makes the kill 
(e.g., Kaplan and Hill, 1984; Lee, 1979; Damas, 1971], In many other stateless 
societies, men risk their lives in warfare with other groups (e.g., Meggit, 1977]. 
There is also evidence that a great deal of cooperation takes place in contem¬ 
porary state-level societies without external sanctions. For example, people 
contribute to charity, give blood, and vote—even though the effect of their own 
contributions on the welfare of the group is negligible. The groups benefiting are 
often very large and composed of very distantly related individuals. Perhaps the 
most dramatic examples of cooperation in contemporary societies are under¬ 
ground movements such as Poland’s Solidarity in which people cooperate to 
achieve a common goal in opposition to all of the power of the modern state (see 
Olson, 1971, 1982, and Hardin, 1982, for further examples.] Because of the an¬ 
ecdotal nature of these data, it is possible to doubt any particular example. 
However, psychologists and sociologists have also shown that people cooperate 
under carefully controlled laboratory conditions, albeit for smaller stakes. For 
example, Marwell and Ames (1978, 1980] presented individual students with 
two alternative investments: a low return private investment in which profits 
accrued to the individual, and a higher return investment in which returns ac¬ 
crued to all group members whether they invested or not. Students invested in 
the group-beneficial investment at a much higher rate than that consistent with 
rational self-interest. (See Dawes, 1980, for a review of such experiments.] 
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The fact that people cooperate in sizable groups is puzzling from an evolu¬ 
tionary viewpoint. According to contemporary evolutionary theory, cooperative 
behavior can evolve only through one of two mechanisms: inclusive fitness 
effects (Hamilton, 1975) or reciprocity (Trivers, 1971). Inclusive fitness effects 
occur when social groups form so that cooperators are more likely to interact with 
other cooperators than with noncooperators. There has been controversy over 
what processes of group formation suffice to allow cooperation. Some authors 
(e.g., Maynard Smith, 1976) have argued that groups must be comprised of 
genetic relatives for cooperation to be favored. Others (e.g., Wilson, 1980; Wade, 
1978) have argued that other mechanisms suffice. We believe that most authors 
would agree that inclusive fitness effects can give rise to cooperation among 
mammals only in relatively small groups. With the exception of humans, this 
prediction is supported by observations of mammalian social behavior. The rel¬ 
atively few animal societies that have levels of cooperation similar to those of 
humans are typically composed of close relatives (Wilson, 1975; Jarvis, 1981), 
while cooperation in large groups among humans includes cases where co- 
operators are virtually unrelated. 

Cooperation may also arise through reciprocity when individuals interact 
repeatedly. Several related analyses (Axelrod, 1984; Axelrod and Hamilton, 
1981; Brown, Sanderson, and Michod, 1982; Aoki, 1983; Peck and Feldman, 
1986) suggest that cooperation can arise via reciprocity when pairs of individuals 
interact repeatedly. These results suggest that the evolutionary equilibrium in this 
setting is likely to be a contingent strategy with the general form “cooperate the 
first time you interact with another individual, but continue to cooperate only if 
the other individual also cooperates.” Some authors have conjectured that reci¬ 
procity can lead to cooperation in larger groups through a similar mechanism 
(Trivers, 1971; Flinn and Alexander, 1982; Alexander, 1985, 1987:93ff). How¬ 
ever, since there has been no explicit theoretical treatment of the evolution of 
behavior when there are repeated interactions in groups larger than two in¬ 
dividuals, it is unclear whether this conjecture is correct. 

The goal of this chapter is to clarify this issue by extending existing theory to 
explicitly include repeated interactions in large groups. We begin by reviewing 
the evolutionary models of the evolution of reciprocity. We then present a model 
of the evolution of reciprocal cooperation in sizable groups. An analysis of this 
model suggests that the conditions necessary for the evolution of reciprocity 
become extremely restrictive as group size increases. 


Models of the Evolution of Reciprocal Cooperation 

For the most part, evolutionary models of cooperation have been developed 
by biologists interested in explaining cooperative behavior among nonhuman 
animals. (See Wade, 1978; Uyenoyama and Feldman, 1980; Michod, 1982; 
Wilson, 1980, for reviews). These assume that individual differences in social 
behavior, including the strategies that govern individual behavior in potentially 
reciprocal social interactions, are affected by heritable genetic differences. They 
further assume that the outcome of potentially cooperative social interactions 
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affects an individual’s reproductive success. Successful behavioral strategies will, 
thus, increase in the population through natural selection. The question then is: 
under what conditions will natural selection favor behavioral strategies that lead 
to cooperation? The answer to this question should illuminate contempo¬ 
rary human cooperation to the extent that evolved propensities shape human 
behavior. 

If behavioral strategies are transmitted culturally instead of genetically, evo¬ 
lutionary models also provide insight into the conditions under which coopera¬ 
tive behavior will arise in contemporary societies. Some authors (Axelrod, 1984; 
Brown et al., 1982, Maynard Smith, 1982; Pulliam, 1982; Boyd and Richerson, 
1982, 1985) have constructed models, formally quite similar to the genetic ones, 
which assume that behavioral strategies are transmitted from one individual to 
another culturally, by teaching, imitation, or some other form of social learning. 
These models assume that the probability that a strategy is transmitted culturally 
is proportional to the average payoff associated with that strategy. There are many 
plausible ways in which this can occur. For example, it may be that people tend to 
imitate wealthy or otherwise successful individuals. (For discussions of the rela¬ 
tionship between genetic and cultural evolution, see Cavalli-Sforza and Feldman, 
1981; Lumsden and Wilson, 1981; and Boyd and Richerson, 1985). 

The recent work of several authors (Boorman and Levitt, 1980; Axelrod, 
1980, 1984; Axelrod and Hamilton, 1981; Brown et ah, 1982; Aoki, 1983; Peck 
and Feldman, 1986; Boyd and Lorberbaum, 1987) suggests that natural selection 
may favor reciprocity when pairs of individuals interact a sufficiently large number 
of times. These models share many common features. Each assumes a population 
of individuals. Pairs of individuals sampled from this population interact a num¬ 
ber of times. During each interaction, individuals may either cooperate (C) or 
defect (D). Table 8.1 gives the incremental effect of each interaction on the fitness 
of the members of a pair. This pattern of fitness payoffs defines a single period 
prisoner’s dilemma; it means that cooperative behavior is altruistic in the sense 
that it reduces the fitness of the individual performing the cooperative behavior, 
but increases fitness of the other individual in the pair (Axelrod and Hamilton, 
1981; Boyd, 1988). By assumption, each individual is characterized by an 


Table 8.1. The incremental effect of interactions on the fitness 
of the members of a pair 




Player 2 

c 

D 


C 

R, R 

S, T 

Player 1 

D 

T, S 

P, P 


Each player has the choice of two strategies, C for cooperate and D for 
defect. The pairs of entries in the table are the payoffs for players 1 and 2, 
respectively, associated with each combination of strategies. In the case of the 
prisoner’s dilemma it is assumed that T > R > P > S, and 2 R> S + T. 
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inherited strategy that determines how it will behave. Strategies may be fixed 
rules like unconditional defection (“always defect”), or contingent ones like tit- 
for-tat (“cooperate during the first interaction; subsequently do whatever the 
other individual did last time”). The pair’s two strategies determine the effect of 
the entire sequence of interactions on each pair member’s fitness. 

This literature produces three main conclusions about the evolution of 
reciprocity: 

1. Reciprocating strategies, like tit-for-tat, that lead to mutual cooperation 
are successful if pairs of individuals are likely to interact many times. There is 
some dispute about what kinds of reciprocating strategies are most likely to be 
successful, and whether any pure strategy can be evolutionarily stable (Boyd and 
Lorberbaum, 1987; Hirshleifer and Martinez Coll, 1988). But it seems plausible 
there will be a stable equilibrium at which reciprocators are common whenever 
interactions last long enough. 

2. A population in which unconditional defection is common can resist 
invasion by cooperative strategies under a wide range of conditions. When a 
population is mostly made up of individuals who never cooperate, and in¬ 
dividuals are paired randomly, rare reciprocators are overwhelmingly likely to be 
paired with unconditional defectors. Reciprocators suffer because of their will¬ 
ingness to cooperate initially. In many situations, it is plausible that cooperative 
behavior is the derived condition. Thus, to explain the existence of reciprocal 
behavior, we must solve the puzzle of how reciprocating strategies increase 
when rare. 

3. There seems to be a variety of plausible mechanisms that allow recip¬ 
rocating strategies to increase when rare. Axelrod and Hamilton (1981; Axelrod, 
1984) have shown that a very small degree of assortative group formation, when 
coupled with the possibility of prolonged reciprocity, allows strategies like tit- 
for-tat to invade noncooperative populations. Peck and Feldman (1986) have 
shown that the costs of cooperative behavior can be frequency dependent in such 
a way that cooperation increases when rare. Finally, Boyd and Lorberbaum 
(1987) show that if mutation or phenotypic variation is present, unconditional 
defection can be invaded even when groups are formed at random. 

This theory suggests a robust conclusion: lengthy paired interactions favor 
reciprocity. We have suspected that this conclusion is sensitive to group size, for 
in larger groups, enforcing individuals bear the full cost of punishing defectors 
while the benefit of enforcement flows to the whole group. (See Boyd and 
Richerson, 1985, 228-230, for a simple game-theoretic presentation of this in¬ 
tuition.) Authors like R. D. Alexander (1985, 1987:93ff), however, have argued 
that reciprocity can lead to cooperation in sizable groups. Thus, we offer an 
explicit investigation of repeated interactions in groups larger than two. 


Model Assumptions 

Our model closely resembles evolutionary models of reciprocity in pairs. Sup¬ 
pose there is a population of individuals—each characterized by an inherited 
strategy. Groups are formed by sampling n individuals from the population who 
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interact in a repeated n-person prisoner’s dilemma. Each individual’s payoff 
depends on his strategy and the strategies used by the n — 1 other individuals 
in the group. The representation of any strategy in the next generation is a 
monotonically increasing function of the average payoff received by individuals 
playing that strategy during the previous period. (As argued by Brown et al. ; 
1982, this assumption is consistent with haploid genetic inheritance of strategies 
and some simple forms of cultural transmission.) We then ask which strategies or 
combinations of strategies can persist. 

We use an M-person prisoner’s dilemma to model cooperation among a 
group of individuals (e.g., Schelling, 1978; Taylor, 1976; for alternative for¬ 
mulations, see Taylor and Ward, 1982; Hirshleifer, 1983). In any time period, 
each individual can choose either to cooperate (C) or to defect (D). An indi¬ 
vidual’s payoff in a single time period depends on her own behavior and on the 
number of cooperators in the group. Let V[C I i) and V[D\i) be the payoffs to 
individuals choosing cooperation and defection, given that i of the n individuals 
in the group choose cooperation. The M-person prisoner’s dilemma demands that 
these payoffs have the following properties: 

1. In any interaction, each individual is better off choosing D, no matter 
what the other n— 1 individuals in the group choose. Thus: 

V[D\i) > V[C\i + 1), i — 0 ,...,m — 1 (1) 

This assumption formalizes the notion that altruistic behavior is costly to the 
individual. If groups are formed at random, and interact only once, this as¬ 
sumption guarantees that cooperative behavior cannot evolve (Nunney, 1985). 

2. If an individual switches from defection to cooperation, every other 
member of the group is better off. This requires that: 


V(D\i+ l)>V[D\i) . 

i = 0. n — I 

V[C\i + l)>V(C|fl ' ' 


( 2 ) 


This assumption formalizes the idea that cooperation benefits other members of 
the group. 

3. The average fitness of individuals in the group increases if one switches 
from defection to cooperation. This requires: 


(i + l)F(C|i + 1) + [n — i — \~)V[D | i + 1) 

>i^(C|i) + (n-i)V(D |i) (3) 

where i= 0,, n — 1. This assumption formalizes the idea that the fitness 
benefits to the whole group from cooperative behavior exceed the fitness costs of 
cooperating. 

We are free to choose the units in which payoffs are accounted. We can thus 
specify that V(D 10) = 0 and V(C I m) = B, where B is a positive constant. When groups 
consist of only two individuals, these three conditions generate a slightly stronger form 
of the prisoner’s dilemma than usual. That is, all three require that T> R, P> S, and 
R> (T+ S)/2 > P rather than the two inequalities listed in table 8.1. 

We derive many of our results here assuming that the payoff to each indi¬ 
vidual in a group during each interaction is a linear function of the number of 
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individuals who cooperated during that interaction. Let the number of indiv¬ 
iduals choosing C during a particular turn be i. Then, the payoffs to individuals 
choosing C and D are: 

V{C\i) = (B/n)i-c 

and (4) 

V(D | i) = ( B/ri)i 

From the definition of the n-person prisoner’s dilemma, it must be that B > c and 
c> B/n. This model is identical to the linear model of social interactions used in 
most kin selection models. Economists and political scientists have used various 
versions of this model to represent the investment in public goods (Hardin, 
1982), although Hirshleifer (1983) shows that nonlinear payoffs can strongly 
affect the advantages of cooperation. Two polar cases of the linear payoff model 
are of particular interest: the case in which B is constant with respect to n, and 
the case in which B is proportional to n. The first represents situations in which 
the benefits produced by a cooperative act are divided up among group mem¬ 
bers, so that increasing group size decreases the benefit per individual group 
member. The second case represents situations in which the benefits reaped by 
one individual do not reduce the benefits received by another. 

Groups of n individuals are sampled from the population and interact re¬ 
peatedly in the n-person prisoner’s dilemma just described. The probability that 
a given group interacts more than t times is uf, where w is a constant between 
zero and one. This assumption means that the expected number of interactions 
among the n individuals is 1/(1 — iv). Thus, as w increases, so does the number of 
interactions between a group of n individuals. If 0, individuals usually in¬ 
teract only once. If ww 1, then individuals interact many times. 

Each individual is characterized by an inherited “strategy” that specifies 
whether the individual will choose cooperation or defection during any time 
period based on the history of the group up to that point. In this analysis, we 
consider only the following strategies: 

U: always defect. 

T a : cooperate on the first move and then cooperate on each subsequent 
move if a or more of the other n — 1 individuals in the group chose 
cooperation during the previous time period. 

The set of strategies T a is a generalization of tit-for-tat. In the n person case, 
there are n — 1 such contingent strategies ( T a with a= 1,..., n — 1), one for each 
of the possible rules of the form “cooperate if a or more individuals cooperated 
on the last move.” Taylor (1976) introduced this family of strategies. We begin 
by assuming that populations consist of only two strategies, U and T a , in which a 
takes on some particular value. Later we will consider populations in which 
three or more strategies are present. 

In populations in which only U and T a are present, an individual’s expected 
fitness depends only on his own strategy and on the number of T a individuals 
among the other n — 1 individuals in its social group. To see this, consider the 
expected fitness of a T a individual in a group in which j other individuals use the 



THE EVOLUTION OF RECIPROCITY IN SIZABLE CROUPS 


151 


strategy T a . The U individuals in such a group always play D. The T a individuals 
always cooperate on the first interaction. They continue to cooperate as long as 
a or more of the other n — 1 individuals cooperated last time. If j > a, the T a 
individuals play C during every interaction. This means that during each time 
period the payoff to T a individuals is V(C I;' + 1). The effects of social interaction 
on the fitness of any particular individual depends on the number of time periods 
that individual’s group interacts. If j > a, the average payoff to T a individuals, 
over all groups with other cooperators F(T a \j), is: 


F(T a \j) = V(C\j + 1}(1 + w + w 2 + w 3 + ■■■) 
_ V[C\j+ 1) 

1 — w 


(5) 


If j < a, the T a individuals cooperate during the first interaction and defect 
thereafter. This means that the payoff to T a individuals is V(C I j + 1] during the 
first period and V(C I 0) during any subsequent periods. Thus, 


F(T a \j) = V(C\j + 1) + V(D\n)(w + w 2 + w 3 + ■ ■ ■) 


= V{C\ j+ 1) + 


wV{D | 0) 
1 — w 


( 6 ) 


A similar argument shows that the expected payoff to U individuals in groups in 
which j of the other n — 1 individuals are characterized by the strategy T a is as 
follows: 


F(U |;) = 


VjD\j) 

1 — w 
V(D\j ) + 


wV{D | 0) 
1 — w 


j> a 
j < a 


(7) 


After the episode of social behavior that generates these payoffs, individuals 
in the population reproduce. We assume that individual fitness is the sum of a 
baseline fitness Wo and the payoff resulting from social interaction. We further 
assume that Wq F{T a I j], F(U\j) for all values of ;, meaning that selection 
acting on social behavior is weak. The expected fitness of T a averaged over all the 
different kinds of groups, W(T a ], is given by: 

W[T a ) = £ mU | Ta)[Wo + F(Tali')} (8) 

i=0 

The term in braces is the expected fitness of a T a individual in a group with j other 
T a individuals. This term is multiplied by the probability that a T a individual finds 
herself in such a group, m(j I T a ), and is summed over all possible groups. Simi¬ 
larly, the expected fitness of an unconditional defector, W(U), is the following: 

n— 1 

W(U) = J2 m(j\U){W 0 + F(U\j)} (9) 

i =0 

where m(j I U) is the probability that a U individual finds herself in a group in 
which there are j T a individuals. 
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If the frequency of T a in the population before social interaction is p, then 
the frequency before social interaction in the next generation, p', is: 


where 


P'=P + P{ 1 ~P) 


mr^ww]] 

w 


( 10 ) 


W = pW (To) + (1 -p)W(U) 

To determine the long-run evolutionary outcome, we determine what fre¬ 
quencies of T a and U represent stable equilibria of the recursion (10). 


Evolution of Reciprocity When Croups Are Formed Randomly 


We begin by assuming that groups form randomly. This assumption means that 
individuals do not interact with genetic relatives, nor are they able to assort 
themselves based on observable phenotypic characteristics. In the special case 
of pairs, theory (reviewed earlier) suggests that strategies leading to reciprocal 
cooperation can evolve as long as individuals interact a large enough number of 
times. We want to know how increasing group size will affect this conclusion. 
We formalize this assumption by specifying that both m(; | T a ) and m(; | U) are 
binomial probability distributions with parameter p, labeled m(;'). 

According to equation (10), the frequency of T a will increase whenever the 
expected fitness of T a , W{T a ), is greater than the expected fitness of U, W(U] 
(unless the population is at an equilibrium point, in which case there is no 
change). When groups are formed at random, the condition for T a to increase has 
the following particularly simple and instructive form: 


11—1 


^[l/(D|;)-nC|;+ 1)M;) + £ 

; = 0 = a + 1 

V[C\a + 1) 


[V(D\j)-V(C\j + 1 )]m(/) 


1 — w 


< 


1 — IV 


- - V(D I a) 


m[a ) 


( 11 ) 


where if the upper bound of the sum is less than the lower bound, the sum is zero 
by convention. This expression says that T a individuals have a fitness advantage 
relative to U individuals only in groups in which a single additional defector will 
cause cooperation to collapse. For T a to be favored by selection, the advantage it 
gains in such groups must be larger than the disadvantage T a suffers in all other 
groups. To see this, consider each of the three terms in this expression. The first 
term represents the sum of the fitness advantages of U individuals in groups in 
which fewer than a of the other n — 1 individuals are reciprocators, weighted by 
the probability that such groups form. In such groups, T a individuals cooperate 
only once, and U individuals do not cooperate at all. The definition of the 
n-person prisoner’s dilemma guarantees that V[C\j+ 1) < V(D\j). This term is 
therefore always positive. The second term represents the average fitness ad¬ 
vantage of unconditional defection in groups in which more than a of the other 
n— 1 individuals are reciprocators. This term is multiplied by 1/(1 —w), the 
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expected number of interactions, because in such groups T a individuals cooperate 
and U individuals defect for as long as the group persists. Again, this term is 
always positive. The right-hand side of the inequality gives the difference be¬ 
tween the fitness of the two strategies in groups in which exactly a of the other 
n — 1 individuals are reciprocators, multiplied by the probability that such groups 
form. A T a individual in such a group both cooperates and receives the benefits of 
cooperation of a other cooperators, V(C I a+ 1), for as long as the group persists. 
Replacing that T a individual with a U individual causes other reciprocators to 
cease cooperating after the first interaction. This term cannot be positive unless 
the fitness of a cooperator in such a group is greater than the fitness of a defector 
in a group of n defectors, that is, V{C I a + 1) > 0. Suppose that this condition is 
satisfied. Then, if the expected number of interactions is large enough (i.e., w is 
close enough to one), T a individuals will have an advantage relative to U in¬ 
dividuals in groups in which a of the other n — 1 individuals are reciprocators. 
For T a to be favored by selection, the advantage that T a individuals gain in such 
groups must exceed the advantage to U individuals in all other groups. 

With this result in mind, consider the equilibrium behavior of this model. 
The frequency of the two strategies in the population will not change when 
p' =p. Values of p that satisfy this condition are equilibrium values, denoted p. 
Since there is no migration or mutation, p = 1 (all T a individuals) and p = 0 (all U 
individuals) are always equilibrium values of equation (10). There may be other 
equilibria at which both U and T a are present in the population. At these 
polymorphic (or “interior”) equilibria, the average fitness of the two strategies 
must be equal. An equilibrium is stable if the population returns to the same 
equilibrium frequency after small perturbations. Stable equilibria are interesting 
because they tell us something about what kinds of strategies, or mixes of 
strategies, can persist in the long run. Unstable equilibria are also interesting 
because they give information about the range of initial conditions that can result 
in various long-run outcomes. Such an analysis yields the following results. 

A population in which U is common can resist invasion by any reciprocating 
strategy, T a . This is true for all values of w. As in the two-person case, a popu¬ 
lation that is all unconditional defectors can resist invasion by any reciprocal 
strategy we consider. When unconditional defection is very common and groups 
are formed randomly, most groups contain n unconditional defectors. The 
few T a individuals in the population will be in groups in which all other in¬ 
dividuals are unconditional defectors. These solitary reciprocators cooperate 
once and thereafter defect. The average fitness of unconditional defectors will 
always be higher than the average fitness of any reciprocal strategy, because 
V(D I 0) > V(Cl 1). 

A population in which T„_i is common can resist invasion by unconditional 
defection if, and only if, w is sufficiently large. It is the only reciprocal strategy that 
has this property. T„_] is the reciprocating strategy that is completely intolerant 
of defection. Individuals using T„_ j will cooperate only if every other individual 
cooperated during the previous time period. Strategies that continue to cooper¬ 
ate despite one or more defections ( T a , 0 < a < n — 1) cannot be evolutionarily 
stable when groups form randomly. When T a is common, the great majority 
of unconditional defectors will be isolated in groups in which the other n — 1 
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individuals are all reciprocators. Unless a = n— 1, the T a individuals in such 
groups will continue to cooperate despite the defector. Since V[D j n — 1) > U(C| n), 
unconditional defectors will have higher average fitness than reciprocators. 

The parameter w is a measure of the number of times that individuals 
interact in groups. T n _i is evolutionarily stable only if: 

w>w c = 1 - V[C\n)/V[D\n- 1) (12) 

This relationship has a simple interpretation. Consider an individual in a group in 
which all other individuals use the strategy T n ]. If this individual defects on every 
turn, his payoff will be V(D\n— 1) in the first time period and U(D|0) = 0 
thereafter. If he instead cooperates, his payoff is V(C I n) every period. Because the 
average number of interactions is 1/(1 —w), condition (12) requires that the av¬ 
erage payoff from choosing cooperation be greater than the average payoff from 
choosing defection—if cooperation is to resist invasion by individuals using U. 
More iterations mean more chance of satisfying this condition, all else being equal. 

Assuming linear payoffs, the domain of attraction ofT n _\ diminishes rapidly as 
group size increases. If pairs of individuals interact long enough, either uncondi¬ 
tional defection or T„_i can persist. How likely is it that a population will end up 
at the cooperative equilibrium? One approach to answering this question is to 
determine the domain of attraction of the two equilibria. An equilibrium’s do¬ 
main of attraction is the set of initial frequencies that begin trajectories ending at 
that equilibrium. The bigger the domain of attraction of an equilibrium, the 
more likely it is, in some sense, that a population will end up there. (Later we 
will consider a second approach to answering this question.) 

We have not been able to determine the domains of attraction for the two 
fixed equilibria in general. We have found them, however, in the special case 
in which the payoffs are linear functions of the number of defectors. Only two 
stable equilibria exist in this special case, p = 0 and p = 1. There is also a single 
unstable polymorphic equilibrium. The frequency of reciprocators at the inter¬ 
nal equilibrium, p c , is (Appendix, part 1): 


Pc — 


c — B/n y/" 
w(B — c)/(l — w~)) 


(13) 


If the initial frequency is higher than p c , then the population eventually will 
consist of all reciprocating (T„_i) individuals. If the initial frequency of co- 
operators is less than p c , the population eventually will be comprised of all U 
individuals. 

To interpret equation (13), remember that the expected fitness of the two 
strategies must be equal at any polymorphic equilibrium. The term c— Bln is the 
difference in fitness between U and T n | individuals during the first interaction. 
The term w[B — c)/(l — w ) is the fitness advantage of T n \ relative to U when the 
other n — 1 members of the group are reciprocators. The critical frequency of 
T n _i individuals necessary for selection to favor T n | thus is simply the ratio of 
the incremental benefit to the incremental cost of defecting during the first 
interaction raised to the 1 In power. Because the incremental benefit increases as 
the expected number of interactions becomes large (i.e., as w -> 1), the threshold 
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frequency of cooperators necessary for cooperation to increase approaches zero 
(i.e., p c —> 0). The domain of attraction for the unconditional defection equilib¬ 
rium thus shrinks toward zero. Raising the ratio to the power 1 In, however, 
means that the threshold frequency of cooperators necessary for cooperators to 
be favored, p c , increases as group size increases. This effect occurs because the 
probability of forming cooperative groups diminishes geometrically as group size 
increases when groups are formed at random. 

Figure 8.1 illustrates the magnitude of this effect by showing the values of p c 
for various parameter combinations. For small groups, cooperators need increase 
to only a small fraction of the population for selection to favor cooperation. For 
even modest-sized groups, however, the cooperative strategy T n ] must reach 
substantial frequency before this strategy increases. For large groups, virtually 
the entire population must consist of cooperators before the cooperative strategy 
can increase. 

In populations composed of T a (n — 1 > a > 0) and U, there is a single stable, 
internal equilibrium as long as w is large enough, c < B(a+ 1 )/n, and payoffs are 
linear. Of the set of reciprocating strategies we have considered, we have found 
that only T n ] can resist invasion by rare unconditional defectors (17). We also 



Figure 8.1. This figure presents the threshold frequency of T n _\ that must be exceeded 
for this strategy to increase (i.e., pf) as a function of group size (n) for four values of u>: 

0.9 (-), 0.99 (-), 0.999 (••••}, and 0.9999 (.}. These values of w correspond to 

10, 100, 1000, and 10000 interactions, on average, between pairs of individuals, (a) The 
incremental benefit to individual due to one cooperator is proportional to group size 
(B = 1.14In), (b) the incremental benefit is constant (£5 = 2). 
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found, however, that T n _i is unlikely to increase when rare. It would be inter¬ 
esting to know whether there are any stable internal equilibria at which more 
tolerant cooperative strategies (T a , a<n— 1) and unconditional defection co¬ 
exist. It seems plausible that the threshold frequency necessary to get such 
strategies started in a population might be lower. 

It turns out that there are two internal equilibria, one stable and the other 
unstable, as long as (see the Appendix, part 2): 


and 


Pd = 


B[a + l)/n — c 
B-c 


>0 


(14) 


c — Bln 

iv - 

(c — B/n ) Prob (; < a\p = pj) + a[B/n ) Prob [j = a\p = pd) 

The frequency of T a at the stable internal equilibrium, p s , is always greater than 
pd, and the frequency of T a at the unstable equilibrium, p ui is less than pd . If the 
initial frequency of T a is less than p w the population will eventually consist of all 
unconditional defectors. When the initial frequency of T a is greater than p u , the 
frequency of T a eventually will stabilize at p s . When w is less than this critical 
value, the only equilibria are monomorphic for T a or U. 

Numerical determination of the internal stable equilibria suggests that as a de¬ 
creases (1) the frequency of the strategy T a at the stable internal equilibrium decreases, 
(2) the threshold frequency ofT a necessary for T a to be favored decreases, and (3) the 
threshold value of w necessary for the internal equilibria to exist increases. One can 
determine the frequency of the two strategies at these polymorphic equilibria by 
finding the values of p for which W(U) = W(T a ). Figure 8.2 shows the results 
of numerical determinations of these equilibrium values for several combinations 
of parameter values. When a is almost n — 1, reciprocators will allow only a few 
defectors before defecting themselves. In this case, the frequency of the recip¬ 
rocating strategy, T a , is high, but so is the threshold frequency of T a necessary to 
get cooperation started. Note also that when a is near n— 1, the internal equi¬ 
librium may be stable even when w is fairly small. As a decreases, the recipro¬ 
cating strategy tolerates a larger number of defectors. This greater tolerance 
decreases the frequency of the cooperative individuals at the stable equilibrium, 
the threshold frequency of T a necessary to get cooperation started. As a decreases, 
w must be large in order for a stable equilibrium to exist at all. 

Populations at stable equilibria involving two strategies, T a and U, (n — 1 > 
a > 0), can resist invasion by rare individuals using any other reciprocating strategy, 
Tb where a # b. So far we have limited our analysis to populations in which only 
two strategies are present. This omission might be important. Assuming w is 
sufficiently large, it is relatively easier for cooperation to get started when co¬ 
operating individuals are quite tolerant. But tolerant strategies can achieve only 
a low frequency at equilibrium. Suppose that such an equilibrium is reached. If 
less tolerant individuals could then invade, the population might reach a new 
equilibrium at which cooperators existed in higher frequency. If this could 
happen repeatedly, then the cooperators might eventually achieve a high fre¬ 
quency through a sort of ratchet mechanism. 
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Frequency of reciprocators (p) 

Figure 8.2. Plots of the two internal equilibria in populations characterized by two 
strategies, T a and U, for various parameter values for n = 32 and B = 2. Part (a) shows 
how to determine the values of the two internal equilibria for a given value of 1/(1 — w ). 
Part (b) shows how these values are affected by changes in the parameter a, the coop¬ 
eration threshold of reciprocators. 


It turns out, however, that a population at a stable polymorphic equilibrium 
involving U and T a can resist invasion by any other rare reciprocating strategy, Ty. 
For a third strategy to invade, its expected fitness must be greater than the fitness 
of either of the two common strategies that are themselves equal. When the 
invading strategy is sufficiently rare, expected fitness of Ty individuals can be 
calculated assuming that the other n — 1 individuals in their groups are drawn 
from the equilibrium population. It turns out that (see the Appendix, part 3) any 
invading type has lower fitness than the common reciprocating strategy, T a . To 
see this, suppose that b > a, so that the invading strategy is less tolerant of de¬ 
fection than the reciprocating strategy common at the equilibrium. First, recall 
that T a individuals have higher fitness than unconditional defectors only in groups 
in which there are a other T a individuals. In all other groups, unconditional 
defectors have the advantage. Now consider the fitness of Ty individuals. If there 
are a T a individuals in the group, a Ty individual does almost as poorly as an 
unconditional defector, because her defection causes cooperation to collapse. In 
groups with any other composition, Ty individuals either act and thus suffer like 
T a individuals, or they defect after one interaction—thus beating the T a in¬ 
dividuals but losing to the unconditional defectors. The strategy Ty therefore can 
neither capture the benefits of long-term cooperation in groups in which there are 
a threshold number of cooperators nor exploit the cooperation of the common 
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reciprocators as effectively as unconditional defection. The Appendix shows that 
a similar logic holds for a > b. 


The Evolution of Reciprocity When Groups Form Assortatively 

Nonrandom interaction plays an important role in Axelrod’s (1984) influential 
view of the evolution of reciprocity. Like most evolutionary analyses of reci¬ 
procity (but see Peck and Feldman, 1986; Boyd and Lorberbaum, 1987), Ax¬ 
elrod’s study indicates reciprocating strategies such as tit-for-tat cannot increase 
when rare if individuals interact at random. Axelrod shows, however, that re¬ 
ciprocal strategies can increase when rare if individuals pair assortatively, meaning 
that individuals using reciprocating strategies are more likely paired with other 
reciprocators than chance alone would dictate. In genetic models, such as- 
sortative social interactions could arise if individuals tend to interact with genetic 
relatives. If w is large, even a very small amount of assortative interaction will 
allow reciprocating strategies to increase. Thus, in the two-person case, there is 
a synergistic relationship between kin selection and reciprocity in which small 
amounts of kin selection greatly facilitate the evolution of cooperation through 
reciprocity. We now consider whether this synergistic relationship changes as 
group size increases. 

Once again suppose that payoffs are linear and that there are only two 
strategies: reciprocators who cooperate as long as a or more others also cooperate, 
T a , and unconditional defection, U. Also suppose that groups are formed so that 
the probability that a T a individual is in a group in with j other T a individuals is: 

«C j\Ta) = [ n ^[r + (1- rM'[(l - r)(l -p)]"^ 1 ( 15 ) 

where p is the frequency of T a in the population before group formation, and r is 
a measure of assortment (e.g., the relatedness coefficient of kin selection theory). 
The probability that an unconditional defector finds himself in a group in which j 
of the other n — 1 individuals are T a is: 

H;|U] = (”T 1 )[(l-rM[r + (l-rXl-p)]"-’- 1 ( 16 ) 

This model is meant to capture the general notion of assortative group formation 
in a mathematically tractable form. It is consistent with some genetic models— 
for example, a model in which strategies are inherited as haploid sexual traits and 
group members are half siblings. There are many other plausible modes of group 
formation that will not yield exactly this pattern of group formation—for ex¬ 
ample, groups of full siblings. Because the contingent strategies we consider 
cause payoffs to be highly nonlinear functions of the number of reciprocators, 
experience with kin selection models (Cavalli-Sforza and Feldman, 1978) sug¬ 
gests that different patterns of group formation may yield different results. Our 
model nonetheless has generality when used to determine the conditions under 
which a reciprocating strategy can invade a population in which all defection 
is common because many of these alternative models of assortative group 
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formation become approximately equivalent to equations (15) and (16) when 
one strategy is rare. 

With these assumptions, one can show that T a can increase when rare only 
when 


w n ~ l 

(B/n)[(n- l)r + 1] - c + - -V [B[j + l)/n - c]m(;'| T fl ) >0 (17) 

1 — wt —' 

l = a 

“inclusive fitness effect” “reciprocity effect” 

As the frequency of reciprocators, p, approaches zero, the probability that a 
reciprocator finds itself in a group with j other reciprocators, m{j I T a ), becomes 
approximately 


m(;|T a )^[” . ‘ j r'(l (18) 

Selection can favor cooperative behavior when there is assortative social inter¬ 
action even with no possibility of reciprocity, because cooperators are more 
likely than defectors to benefit from the cooperation. The first term on the left- 
hand side of (17) represents this inclusive fitness effect (Hamilton, 1975). This 
term indicates that even if w is zero, T a can increase as long as the inclusive 
fitness of T a individuals is higher than that of unconditional defectors. In the 
present context, the most interesting cases are ones in which the first term is 
negative, meaning that cooperation could not be favored without reciprocity. 
The second term on the left-hand side of (17) gives the effect of reciprocity 
when reciprocators are rare. The added benefit received by reciprocators in 
groups in which there are more than a reciprocators is the increase in fitness per 
interaction (B(; + 1 )/n —c) times the number of additional interactions during 
which reciprocators receive the benefit (w;/(l — w)). Reciprocity will aid the 
spread of strategies like T a as long as benefits produced by cooperation in a group 
of a + 1 cooperators exceed the costs [B(a + l)/» — c > 0). 

There is a striking synergistic relationship between kin selection and reci¬ 
procity when pairs of individuals interact (Axelrod and Hamilton, 1981). A small 
degree of assortative social interaction, coupled with the possibility of long-term 
reciprocal relationships, can lead to extensive cooperation in situations in which 
neither factor alone would cause cooperation. This synergy diminishes very 
rapidly as group size increases according to (17). When r is small and a is a 
substantial fraction of n — 1, the reciprocity effect in (17) becomes approximately 
proportional to the probability that a of the other n — 1 individuals in the group 
are reciprocators. When r is small and a/(n — 1) r, this probability diminishes 
very rapidly as n increases. The clearest case is when a = n — 1. For a given B, c, 
and r, the expected number of interactions after the first must increase as (l/r)“ 1 
for the magnitude of the reciprocity effect to remain constant. 

Figure 8.3 illustrates the dramatic nature of this effect. It plots the threshold 
values of 1/(1 — w) necessary for T a to increase when rare as given by expression 
(17). We see that assortative group formation may play a significant role in getting 
reciprocal cooperation started when groups are small. For example, for n = 3 and 
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Figure 8 . 3 . This figure presents the threshold values of 1/(1 — w) that must be 
exceeded if the strategy T a is to increase when rare as a function of group size (n) for 

four values of r: 1/4 (-), 1/8 (-}, 1/16 (••••), and 1/32 (-}. [a] a = n — 1: 

(b) a = (3/4)(n— 1). 


a = 2, even very small amounts of assortment (e.g., r= 1/32) will cause selection 
to favor reciprocity even when w is quite small (e.g., individuals interact roughly 
10 times). When groups are larger, however, no amount of assortment will cause 
selection to favor reciprocity unless w reaches extremely high values. Consider 
k= 16 and a =15. When r= 1/2, cooperation is favored without reciprocity. 
When r= 1/4, individuals must interact roughly 10 million times if reciprocity is 
to be favored. When a <n — 1, the qualitative picture is similar. T a can increase 
when rare under a somewhat wider range of group sizes, but it remains true that 
the reciprocity effect diminishes rapidly as group size increases. 


Conclusion 

Reciprocity is likely to evolve only when reciprocating groups are quite small. 
Previous research based on the repeated two-person prisoner’s dilemma game 
indicates that pairwise reciprocity will often evolve. Here we have modeled social 
interaction within groups of n individuals as a repeated w-person prisoner’s 
dilemma game and asked under what conditions will selection favor strategies 
leading to reciprocal cooperation. In general, increasing the size of interacting 
social groups reduces the likelihood that selection will favor reciprocating strat¬ 
egies. For quite small groups, the results parallel the two-person case. For larger 
groups, however, the conditions under which reciprocity can evolve become 
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extremely restrictive. This result satisfies the natural historian’s conventional 
wisdom: large, cooperative groups composed of distantly related individuals are 
unusual in nature. But it leaves human cooperation unexplained. 

Reciprocal strategies must satisfy two competing desiderata to succeed. First, 
to persist when common, they must prevent too many defectors in the popula¬ 
tion from receiving the benefits of long-term cooperation. The threshold number 
of cooperators thus must be a substantial fraction of group size. Second, to in¬ 
crease when rare, there must be a substantial probability that the groups with the 
threshold number of cooperators will form. This problem is not great when pairs 
of individuals interact; a relatively small degree of assortative group formation 
will allow reciprocating strategies to increase. As groups becomes larger, how¬ 
ever, this desideratum can be satisfied only if the threshold number of cooperators 
is fairly small, or the degree of assortment in the formation of groups is large. 

Our model omits many features that may be important in potentially co¬ 
operative social interactions. We suspect that three of the most important 
missing features are as follows: 

1. No internal sanctions. We precluded the possibility that individuals could 
directly punish defectors. A cooperator in the n- person prisoner’s dilemma can 
punish a defector only by withholding future cooperation—which also punishes 
other cooperators. Cooperation might flourish under a wider range of conditions 
if cooperators could focus punishment on defectors alone. 

2. No internal structure. Our groups have no internal structure. Cooperation 
might arise in larger groups if individuals interact in some kind of network or 
hierarchy. 

3. Oversimplified game structure. Much of our analysis presumed linear 
payoffs. Several authors have argued that other games may be equally important 
for our understanding of cooperation. Hirshleifer (1983) has shown that the 
nature of the payoff schedule as a function of number of cooperators has im¬ 
portant effects on motivation to cooperate. Kelley and Thibaut (1978) discuss a 
large array of mixed-motive games that characterize various social interactions, 
and Taylor and Ward (1982) argue that the n-person version of the game 
“chicken” is essential to understanding cooperation. It may be that the prisoner’s 
dilemma with linear payoffs is particularly demanding for the evolution of co¬ 
operation and that other models would allow the evolution of cooperation in 
sizable groups under a wider range of conditions. 

Omitting these features certainly argues for caution in interpreting our 
results. But including these features would not necessarily allow reciprocity to 
evolve in large groups. It is especially unclear what peculiarities of the human 
case allow us to violate the generalization to which both theory and the natural 
history of nonhuman animals point: the evolution of large cooperative societies 
normally depends more on kin selection than reciprocity. Elsewhere we argue 
that cultural analogs of kin and group selection are indeed promising mechan¬ 
isms to explain human cooperation (Boyd and Richerson, 1982, 1985, chs. 7 and 
8). Campbell (1983) hypothesizes that effects like those we have modeled here 
suffice to explain the scale of cooperation observed in simpler human societies, 
but not in the state-level societies of the last 5,000 years. The range of plausi¬ 
ble arguments is still quite broad. But the sharp decline in the tendency of 



162 human cooperation, reciprocity, croup selection 


reciprocity to evolve as a function of group size, and the apparent rarity of 
cooperation in large groups of nonkin in nature, commands attention. At the 
very least, our results suggest we should view with substantial skepticism and 
subject to more searching analysis explanations of human cooperation based on 
reciprocity. 


APPENDIX 


1. With linear payoffs, and w large enough, there is a single, unstable internal 
equilibrium at which the frequency of T „_i is given by equation (13). At any interior 
equilibrium, W(T a ) = W(U). With linear payoffs, this requires that 


(B/n — c]\l—w ^2 m U) 


j = a+ 1 


+ w(B(a + l)/w — c)m(a) = 0 


(Al) 


If w is large enough that (12) is satisfied, and a = n — 1, this equation can be satisfied 
only for one value of p, that given in (13). Since both of the boundary equilibria are 
stable when (12) is satisfied, the interior equilibrium is unstable. 

2. If n — 1 > a > 0, payoffs are linear, and both conditions in (14) are satisfied, 
then there are two interior equilibria p = p u and p = p s such that p u < pd < Ps- P =Pu is 
unstable, and p = p s is stable. Equation (Al) can be rewritten as follows: 


h[p) = w{c — B/n}{\ —I p {a,n — aJ) + wB[a/n)m[a) = c — B/n (A2) 

where I p {x, y) is the incomplete beta function. First, notice that c — B/n> K i) = 
u>[c — B/n) > h[ 0). Next, differentiating h[p] with respect to p (A2) yields this: 


■4~h[p) = w(n — 1 — a)m{a — l)[B(a + l)/n — c — p[B — c)] (A3) 

ilp 

If B[a + l)/n — c < 0, li(p) is monotonically decreasing, and therefore there are no 
values of p in the interval (0,1) that satisfy (A2), thus no interior equilibria exist. If 
B[a+ l)/n — c> 0, h[p) is unimodal with a maximum at p=pd, where pd has the 
value given in (14). Thus, if h[pd) > c — B/n, there are two values of p that satisfy 
(Al), and if h[pd) <c—B/n, there are none. Clearly for small enough iv,h[pd) < 
c— B/n, and thus there are no interior equilibria. Similarly, since h[pd ) > w[c — B/n), 
for w close enough to one, h[pd) >c—B/n, and there are two interior equilibria. 
Further, since h[pd] is a linear function of w, there is some value of w, Wd, such that 
there are no interior equilibria for 0 < p < pd, and there are two interior equilibria for 
Pd<P < 1- By solving (Al) for w and setting p =pd, one obtains the expression for Wd 
given in the text. 

From (10), the derivative of p' with respect to p evaluated at an interior equi¬ 
librium point, L, is the following: 


L=l + gCWO - w))m[a\p=p)[B(a. + l)/n - c - p{B - c)] 

W 0 + F{U\p=p) 1 J 

Thus, if p < pd, L > 1, and the equilibrium is unstable. If p > pd, L < 1. As long as Wo 
is large enough, L > — 1, and thus the equilibrium is stable. 

3. Populations at stable equilibria involving two strategies, T a and L7 [n— 1 > 
a > 0), can resist invasion by rare individuals using any other reciprocating strategy, 
Ty where a ^ b. 



THE EVOLUTION OF RECIPROCITY IN SIZABLE GROUPS 163 


When T b is sufficiently rare, we can ignore the probability that groups with more 
than one T b individual will occur. This means that the fitness of Ty individuals will 
depend only on j, the number of T a individuals in their group. First, suppose that 
a > b. Then for > a, or b > j, F(T a | j) = F(T h \ j ). For a>j>b, F(T a \ j ) = B(j + 1)/ 
n-c, while F(Th\j)=B[j + l)/n — c — w(c — B/n) < F(T a | j) by definition. Thus, in 
this case, the expected fitness of the invading type is lower than that of the common 
reciprocator. Next, suppose that a <b. Then for / > b, or a > j, F(T a |;') = F(T b | /'). For 
b > j > a, F(T a |;') = [B(j + l)/n-c]/(l - w), while F(T b \ ;] = [£(;+ l)/n - c] + 
w[Bj/n]/{ 1 —ui)> F(T a |y) for values of w consistent with the existence of an inte¬ 
rior equilibrium. For j = a, F(T a \ j) = [B(;'+ l)/« — c]/(l — w), while F(T b \j) = 
l B U+ l)/w-c] + wBj/n < F(T a | j) for values of w consistent with the existence of an 
interior equilibrium. Thus, 


W{T h )-W{T a ) = wyn[_a) 


B[a/n) — 


B[a + 1 )/n — c 


1 — w 


b -1 


\—' . ,,B n — c 

+ > wmi])— - 

. 1 - w 

j — a+ 1 


(A5) 


By using (11] to eliminate terms containing m{a), (A5) becomes: 


W{T h )-W{T a ) = w^m{j)[B/n-c)/{\ - w) + w^m[j){B/n-c), 

j=b j =0 

which is always less than zero. 


NOTE 

We thank Joan Silk and John Wiley for extremely useful comments on previous 
drafts of this chapter. 
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Punishment Allows the 
Evolution of 

Cooperation (or Anything Else) 
in Sizable Groups 


Human behavior is unique in that cooperation and division of labor 
occur in societies composed of large numbers of unrelated individuals. In other 
eusocial species, such as social insects, societies are made up of close genetic 
relatives. According to contemporary evolutionary theory, cooperative behavior 
can be favored by selection only when social groups are formed so that co- 
operators are more likely to interact with other cooperators than with non¬ 
cooperators (Hamilton, 1975; Brown, Sanderson, and Michod, 1982; Nunney, 
1985). It is widely agreed that kinship is the most likely source of such non- 
random social interaction. Human society is thus an unusual and interesting 
special case of the evolution of cooperation. 

A number of authors have suggested that human eusociality is based on 
reciprocity (Trivers, 1971; Wilson, 1975; Alexander, 1987), supported by our 
more sophisticated mental skills to keep track of a large social system. It seems 
unlikely, however, that natural selection will favor reciprocal cooperation in 
sizable groups. An extensive literature (reviewed by Axelrod and Dion, 1989; 
also see Hirshleifer and Martinez-Coll, 1988; Boyd, 1988; Boyd and Richerson, 
1989) suggests that cooperation can arise via reciprocity when pairs of in¬ 
dividuals interact repeatedly. These results indicate that the evolutionary equi¬ 
librium in this setting is likely to be a contingent strategy with the general form: 
“cooperate the first time you interact with another individual, but continue to 
cooperate only if the other individual also cooperates.” Several recent articles 
(Joshi, 1987; Bendor and Mookherjee, 1987; Boyd and Richerson, 1988, 1989) 
present models in which larger groups of individuals interact repeatedly in po¬ 
tentially cooperative situations. These analyses suggest that the conditions under 
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which reciprocity can evolve become extremely restrictive as group size in¬ 
creases above a handful of individuals. 

In most existing models, reciprocators retaliate against noncooperators by 
withholding future cooperation. In many situations other forms of retaliation are 
possible. Noncooperators could be physically attacked, be made the targets of 
gossip, or denied access to territories or mates. We will refer to such alternative 
forms of punishment as retribution. It seems possible that selection may favor 
cooperation enforced by retribution even in sizable groups of unrelated in¬ 
dividuals because, unlike withholding reciprocity, retribution can be made only 
against noncooperators, and because the magnitude of the penalty imposed on 
noncooperators is not limited by an individual’s effect on the outcome of coop¬ 
erative behavior. 

Here, we extend the theory of the evolution of cooperation to include the 
possibility of retribution. We review the evolutionary models of the evolution of 
reciprocity in sizable groups and present a model of the evolution of cooperation 
enforced by retribution. An analysis of this model suggests that retribution can 
lead to the evolution of cooperation in two qualitatively different ways. 

1. If the long-run benefits of cooperation to a punishing individual are 
greater than the costs to that single individual of coercing all other 
individuals in a group to cooperate, then strategies that cooperate 
and punish noncooperators, strategies that cooperate only if pun¬ 
ished, and, sometimes, strategies that cooperate but do not punish 
coexist at a stable equilibrium or stable oscillations. 

2. If the costs of being punished are large enough, “moralistic” strat¬ 
egies that cooperate, punish noncooperators, and punish those who 
do not punish noncooperators can be evolutionarily stable. 

We also show, however, that moralistic strategies can cause any individually 
costly behavior to be evolutionarily stable, whether or not it creates a group 
benefit. Once enough individuals are prepared to punish any behavior, even the 
most absurd, and to punish those who do not punish, then everyone is best off 
conforming to the norm. Moralistic strategies are a potential mechanism for 
stabilizing a wide range of behaviors. 


Models of the Evolution of Reciprocity 

Models of the evolution of reciprocity among pairs of individuals share many 
common features. Each assumes that there is a population of individuals. Pairs of 
individuals are sampled from this population and interact a number of times. 
During each interaction individuals may either cooperate (C) or defect (D). The 
incremental fitness effects of each behavior define a single period prisoner’s di¬ 
lemma, and, therefore, cooperative behavior is altruistic in the sense that it reduces 
the fitness of the individual performing the cooperative behavior but increases 
fitness of the other individual in the pair (Axelrod and Hamilton, 1981; Boyd, 
1988). Each individual is characterized by an inherited strategy that determines 
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how he will behave. Strategies may be fixed rules like unconditional defection 
(“always defect”) or contingent ones like tit-for-tat (“cooperate during the first 
interaction; subsequently do whatever the other individual did last time”). The 
pair’s two strategies determine the effect of the entire sequence of interactions on 
each pair member’s fitness. An individual’s contribution to the next generation is 
proportional to his fitness. 

Analysis of such models suggests that lengthy interactions between pairs of 
individuals are likely to lead to the evolution of reciprocity. Reciprocating 
strategies, like tit-for-tat, leading to mutual cooperation, are successful if pairs of 
individuals are likely to interact many times. A population in which uncondi¬ 
tional defection is common can resist invasion by cooperative strategies under a 
wide range of conditions. However, there seem to be a variety of plausible 
mechanisms that allow reciprocating strategies to increase when rare. Axelrod 
and Hamilton (1981) and Axelrod (1984) have shown that a very small degree of 
assortative group formation, when coupled with the possibility of prolonged 
reciprocity, allows strategies like tit-for-tat to invade noncooperative popula¬ 
tions. Other mechanisms have been suggested by Peck and Feldman (1985), 
Boyd and Lorberbaum (1987), and Feldman and Thomas (1987). 

Recent work suggests that these conclusions do not apply to larger groups. 
Joshi (1987) and Boyd and Richerson (1988) have independently analyzed a 
model in which n individuals are sampled from a larger population and then 
interact repeatedly in an n-person prisoner’s dilemma. In this model, cooperation 
is costly to the individual, but beneficial to the group as a whole. This work 
suggests that increasing the size of interacting social groups reduces the likeli¬ 
hood that selection will favor reciprocating strategies. As in the two individual 
cases, if groups persist long enough, both reciprocal and noncooperative behavior 
are favored by selection when they are common. For large groups, however, the 
conditions under which reciprocity can increase when rare become extremely 
restrictive. Bendor and Mookherjee (1987) show that when errors occur, re¬ 
ciprocal cooperation may not be favored in large groups even if they persist 
forever. Boyd and Richerson (1989) derived qualitatively similar results in which 
groups were structured into simple networks of cooperation. 

Intuitively, increasing group size places reciprocating strategies on the horns 
of a dilemma. To persist when common, they must prevent too many defectors in 
the population from receiving the benefits of long-term cooperation. Thus, re- 
ciprocators must be provoked to defect by the presence of even a few defectors. 
To increase when rare, there must be a substantial probability that the groups 
with less than this number of defectors will form. This problem is not great when 
pairs of individuals interact; a relatively small degree of assortative group for¬ 
mation will allow reciprocating strategies to increase. As groups become larger, 
however, both of these requirements can be satisfied only if the degree of as¬ 
sortment in the formation of groups is extreme. 

This result should be interpreted with caution. Modeling social interaction as 
an n -person prisoner’s dilemma means that the only way a reciprocator can punish 
a defector is by withholding future cooperation. There are two reasons to suppose 
that cooperation might be more likely to evolve if cooperators could retaliate 
in some other way. First, in the n-person prisoner’s dilemma, a reciprocator 
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who defects in order to punish defectors induces other reciprocators to defect. 
These defections induce still more defections. More discriminating retribution 
would allow defectors to be penalized without generating a cascade of defection. 
Second, in the M-person prisoner’s dilemma the severity of the sanction is limited 
by an individual’s effect on the whole group, which becomes diluted as group 
size increases. Other sorts of sanctions might be much more costly to defec¬ 
tors and therefore allow rare cooperators to induce others to cooperate in large 
groups. 

There is also a problem with retribution. Why should individuals punish? If 
being punished is sufficiently costly, it will pay to cooperate. However, by as¬ 
sumption, the benefits of cooperation flow to the group as a whole. Thus, as long 
as administering punishment is costly, retribution is an altruistic act. Punishment 
is beneficial to the group but costly to the individual, and selection should fa¬ 
vor individuals who cooperate but do not punish. This problem is sometimes 
referred to as the problem of "second-order” cooperation (Oliver, 1980; 
Yamagishi, 1986]. 

A recent article by Axelrod [1986] illustrates the problem of second-order 
cooperation. Axelrod analyzes a model in which groups of individuals interact 
for two periods. During the first period individuals may cooperate or defect in an 
n-person prisoner’s dilemma, and in the second, individuals who cooperated on 
the first move have the opportunity to punish those individuals who did not 
cooperate at some cost to themselves. Axelrod shows that punishment may 
expand the range of conditions under which cooperation could evolve. However, 
the strategy of cooperating but not punishing was precluded. As Axelrod notes, 
such second-order defecting strategies would always do better because second- 
order punishment of nonpunishers is not possible. 

The problem of second-order cooperation has been partly solved by 
Hirshleifer and Rasmusen (1989). They consider a game theoretic model in 
which a two-stage game consisting of a cooperation stage followed by a punish¬ 
ment stage is repeated a number of times. They show that if punishment is 
costless, then the strategy of cooperating, punishing noncooperators, and pun¬ 
ishing nonpunishers is what game theorists call a “perfect equilibrium.” (The 
perfect equilibrium is a generalization of the Nash equilibrium that is useful in 
repeated games. See Rasmusen, 1989, for an excellent introductory discussion of 
game theoretic equilibrium concepts.) Because it is a game theoretic model, it 
does not provide information about the evolutionary dynamics. It also seems 
possible that if the model were extended to an infinite number of periods, a 
similar strategy would be evolutionarily stable even if punishment is costly. 

Here we consider evolutionary properties of an infinite period model of 
cooperation with the possibility of punishment that is similar to Hirshleifer and 
Rasmusen’s. We will perform the analysis in three stages. First, we describe the 
basic structure of the model. Then, we consider populations in which there are 
cooperators who punish defection and a variety of strategies that initially defect 
and then respond to punishment in different ways. The goal is to investigate the 
evolutionary dynamics introduced by retribution without the complications in¬ 
troduced by second-order defection and second-order punishment. Finally, we 
consider the effects of these complications. 
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Description of the Model 

Suppose that groups of size n are sampled from a large population and interact 
repeatedly. The probability that the group persists from one interaction to the 
next is w, and thus the probability that it persists for t or more interactions is 
Each interaction consists of two stages, a cooperation stage followed by a 
punishment stage. During the cooperation stage an individual can either coop¬ 
erate (C) or defect (D). The incremental effect of a single cooperation stage on 
the fitness of an individual depends on that individual’s behavior and the be¬ 
havior of other members of the group as follows: let the number of other in¬ 
dividuals choosing C during a particular turn be i. Then the payoffs to individuals 
choosing C and D are: 


V(C|i) = (b/n)(i + 1) — c 

0) 

V(D\i) = [b/n)i 

(2) 


where h > c and c > bln. Increasing the number of cooperators increases the 
payoff for every individual in the group, but each cooperator would be better off 
switching to defection. (This special case of the n-person prisoner’s dilemma has 
been used in economics and political science to represent provision of public 
goods [Hardin, 1982], It is also identical to the linear model of social interactions 
used in most kin selection models.) During the punishment stage any individual 
can punish any other individual. Punishing another individual lowers the fitness 
of the punisher an amount k and the fitness of the individual being punished an 
amount p. 

Each individual is characterized by an inherited “strategy” that specifies how 
she will behave during any time period based on the history of her own behavior 
and the behavior of other members of the group up to that point. The strategy 
specifies whether the individual will choose cooperation or defection during the 
cooperation stage and which other individuals, if any, she will punish during the 
punishment stage. Strategies can be unconditional rules like the asocial rule “never 
cooperate/never punish.” They can also be contingent rules like “always cooperate/ 
punish all individuals who didn’t cooperate during the cooperation stage.” 

We assume that individuals sometimes make errors. In particular, we sup¬ 
pose that any time an individual’s strategy calls for cooperation, there is a 
probability e > 0 that the individual will instead defect “by mistake.” This is the 
only form of error we investigate. Individuals who mean to defect always defect, 
and individuals always either punish or do not punish according to the dictates of 
their strategy. 

Groups are formed according to the following rule: the conditional proba¬ 
bility that any other randomly chosen individual in a group has a given strategy 
Si, given that the focal individual also has S;, is given by: 

Pr(S;|S;) = r + (1 — r) qi (3) 

where q t is the frequency of the strategy S f in the population before social 
interaction, and 0 < r < 1. The conditional probability that any other randomly 
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chosen individual in a group has some other strategy Sj, given that the focal 
individual has S;, is given by: 


Pr(S,|S0 = (l-r)qj 


(4) 


When r= 0, social interaction occurs at random. When r > 0, social interaction is 
assortative. There is a chance r of drawing an individual with the same strategy as 
the focal individual and a chance 1 — r of picking an individual at random from 
the population (who will also be identical to the focal individual with probability 
equal to the frequency of the focal individual’s strategy in the population). If 
strategies are inherited as haploid sexual traits, r is just the coefficient of relat¬ 
edness. For other genetic models, r is not exactly equal to the coefficient of 
relatedness. However, it is a good approximation for rare strategies and thus is 
useful for determining the conditions under which a rare reciprocating strategy 
can invade a population in which all defection is common. 

After all social interactions are completed, individuals in the population 
reproduce. The probability of reproduction is determined by the results of social 
behavior. Thus, the frequency of a particular strategy, S,-, in the next generation, 
q'f, is given by: 


li E^(S ; ) 


(5) 


where W(Si) is the average payoff of individuals using strategy S,- in all groups 
weighted by the probability that different types of groups occur. (As argued by 
Brown et ah, 1982, this assumption is consistent with haploid genetic inheri¬ 
tance of strategies and some simple forms of cultural transmission.) We then ask, 
which strategies or combinations of strategies can persist? 


Results 

No Second-Order Defection 

First, we analyze the evolutionary dynamics of retribution with second-order 
defection excluded. To do this, we consider a world in which only the following 
two strategies are possible. 

Cooperator-punishers (P). During each interaction (1) cooperate, 
and (2) punish all individuals who did not cooperate during the coop¬ 
eration stage. 

Reluctant cooperators (Ri). Defect until punished once, then coop¬ 
erate forever. Never punish. 

We temporarily exclude strategies that cooperate but do not punish to 
eliminate the possibility of second-order defection. We also exclude strategies 
that continue to defect after one act of punishment. This latter assumption is not 
harmless. We show in the Appendix that if Ri is replaced by unconditional 
defection, then (1) cooperation is much less likely to evolve, and (2) _Rj may not 
be able to invade a population in which unconditional defection is common. This 
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analysis is justified for two reasons: first, it provides a best case for the evolution 
of cooperation, and, second, there is abundant empirical evidence that organisms 
do respond to punishment. 

When groups are formed at random [r = 0), such a population can persist at 
one of three stable equilibria (or ESSs): 

• All individuals are R \—no one cooperates. 

• All individuals are P —everyone cooperates. 

• Most individuals are R 1; but a minority are P —most are induced to 
cooperate by the punishing few. 

In what follows we describe and interpret the conditions under which each of 
these ESSs can exist. Proofs are given in the Appendix. 

Reluctant cooperators resist invasion by the cooperating, punishing strategy 
whenever the cost to a cooperator-punisher of cooperating and punishing re — 1 re¬ 
luctant cooperators exceeds the benefit to that punisher that results from the coop¬ 
eration that is induced by his punishment. It can be shown that the responsive 
defecting strategy Rj can be invaded by the cooperating, punishing strategy P as long as: 

k[n — 1] + (c — b/ri) < - (w[b — c)-^--'j (6) 

1 — w \ I — e J 

initial cost of cooper- long-run benefit induced 

ating and punishing by punishing 

When cooperator-punishers are rare, and groups are formed at random, virtually 
all cooperator-punishers will find themselves in a group in which the other n — 1 
individuals are defectors. The left-hand side of (6) gives the fitness loss associated 
with cooperating, and then punishing n — 1 defectors during the first interaction. 
The right-hand side of (6) gives the long-term net fitness benefit of the coop¬ 
eration that results from punishment. The term w[b — c)/(l — w ) is the long-term 
fitness benefit from the induced cooperation by Rj individuals, and the term 
proportional to e is the long-run cost that results from having to punish erro¬ 
neous defections. Thus, if this term is positive, P can invade if w is large enough. 

If the cooperator-punisher strategy, P, can increase when rare, punishing is 
not altruistic. Retribution induces cooperation that creates benefits sufficient to 
compensate for its cost. The longer groups persist, the larger the benefit asso¬ 
ciated with cooperation. Thus, as long as error rates are low or the benefits of 
cooperation are large, longer interactions will permit cooperative strategies to 
invade, even if groups are formed at random. Also notice that the condition for 
Ri to be invaded does not depend on p, the cost of being punished. As one would 
expect, increasing the group size or the error rate makes it harder for the co¬ 
operative strategy to invade. 

The cooperating-punishing strategy, P, is evolutionarily stable as long as 

p(n - 1) >c-fe/" + (1 _ e) P) 

cost of being punished cost of cooperating 

and punishing 
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The first term on the right-hand side of (7) gives the cost of cooperating during 
one interaction; the term on the left-hand side is the cost of being punished by 
n — 1 other individuals, and the second term on the right-hand side is the cost 
of punishing mistakes over the long run. The rare Rj individual suffers the cost 
of punishment but avoids the cost of cooperating on the first turn and the cost of 
punishing erroneous defection over the long run. Notice that this condition is 
independent of the long-run expected benefit associated with cooperation (be¬ 
cause it does not contain terms of the form b/[ 1 — w )). It depends only on the 
cost of the cooperation to the individual and the costs of punishing and being 
punished. Thus, retribution can stabilize cooperation, but this stability does not 
result from the mutual benefits of cooperation. 

There is a stable internal equilibrium at which both P and R 1 are present 
whenever (1} neither Rj nor P are ESSs, or (2) Rj is not an ESS but P is, and the 
condition (A14) given in the Appendix is satisfied. We have not been able to 
derive an expression for the frequency of P at the internal equilibrium. Figure 9.1 
shows the frequency of P at this equilibrium determined numerically as a 
function of the expected number of interactions (log(l/(l — w ))) for various 
group sizes. When groups persist for only a few interactions, both P and R\ are 
ESSs. Increasing the number of interactions eventually destabilizes Rj and allows 
a stable internal equilibrium to exist. Further increases in the expected number 
of interactions destabilize P, leaving the internal equilibrium as the only stable 
equilibrium. 

Without second-order defection, cooperation can persist at two qualitatively 
different equilibria: either cooperative strategies coexist with noncooperative 
strategies at a polymorphic equilibrium, or all individuals in the population are co¬ 
operative. When the cooperator-punisher strategy is very rare, it will increase 
whenever the benefit from long-run cooperation to an individual punisher ex¬ 
ceeds the cost of the punishment necessary to induce reluctant cooperators to 
cooperate. As cooperator-punishers become more common, more reluctant co- 
operators find themselves in groups with at least one cooperator-punisher, and 
thus they enjoy the benefits of long-run cooperation without bearing the costs 
associated with punishing. Thus the relative fitness of cooperator-punishers de¬ 
clines. As cooperator-punishers become still more common, reluctant coop¬ 
erators are punished more harshly during the initial interaction and their relative 
fitness declines. 

Assortative group formation has both positive and negative effects on the 
conditions under which cooperator-punishers evolve. When there is assortative 
group formation, individuals are more likely to find themselves in groups with 
others like themselves than chance alone would dictate. Such assortment de¬ 
creases the cost of cooperating and punishing because cooperators are more 
likely to receive the benefits that result from the cooperative acts of others than 
are noncooperators and because cooperator-punishers need to punish fewer 
noncooperators on the first interaction. However, assortment decreases the long- 
run benefit associated with punishment because cooperator-punishers are more 
likely to be punished for erroneous defection. (Assortment increases the amount 
of punishment that an inadvertently defecting cooperator-punisher receives.) 
The second effect becomes more pronounced the longer groups last because 
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Frequency of P 

Figure 9.1. The equilibrium frequency of P for a given expected number of interactions 
for different group sizes [n = 8, 16, 32} assuming that e = 0.001. For these parameter 
values populations consisting of all P are always at a stable equilibrium. Populations 
without P individuals are also always an equilibrium, but it may be either stable or 
unstable. To find the polymorphic equilibria, pick a number of expected interactions 
and group size, and then determine the frequencies of P at which the horizontal line 
at that the value of log(l/(l — ;<;)} intersects the curve at that value of n. If the horizontal 
line lies below the curve for some qp, then the frequency of P increases; if it lies above 
the curve, the frequency of P decreases. Thus, if there is only one polymorphic equilib¬ 
rium (e.g., n = 4, log(l/(l — wj) = 1], it is unstable and qp = 0 is stable. If there are two 
polymorphic equilibria (e.g., n= 16, log(l/l — iv)) = 3), the polymorphic equilibrium 
with the lower frequency of P is stable, and the other polymorphic equilibrium and 
qp = 0 are both unstable. Finally, if there is no polymorphic equilibrium (e.g., n = 8, 
log(l/(l —w)) = 4], the only stable equilibrium is qp= 1. 


cooperator-punishers will make more errors. The negative effect will predomi¬ 
nate whenever the following condition is satisfied: 

(1 -e){b/n + k)<^~ (8) 

When expression (8) is satisfied, assortment increases the range of conditions 
under which R 1 is an ESS, decreases the range of conditions under which P is an 
ESS, and, if a stable internal equilibrium exists, decreases the frequency of P at 
that equilibrium. Note that the negative effects increase as the expected number 
of interactions increases. When (8} is not satisfied, increasing r decreases the range 
of parameters under which R\ is an ESS, increases the range under which P is an 
ESS, and may either increase or decrease the frequency of P at internal equilibria. 
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Second-Order Defection 

When punishers are common, cooperation is favored because cooperative in¬ 
dividuals avoid punishment. Thus, if punishment is costly, punishment may be 
an altruistic act. It is costly to the individual performing the punishment but 
benefits the group as a whole. This argument suggests that individuals who 
cooperate, but do not punish, should be successful. In the previous model (and 
that of Axelrod, 1986} cooperators always punish noncooperators, and thus this 
conjecture could not be addressed. To allow for second-order defection, consider 
a model in which P and R\ compete with the following strategy. 

Easy-going cooperator (E): Always cooperate, never punish 

When second-order defection is possible, neither E nor P is ever an ESS. A 
population in which P is common can always be invaded by E, because easygoing 
cooperators get the benefits of cooperation without incurring the cost of en¬ 
forcement. A population in which E is sufficiently common can always be in¬ 
vaded by R\, because reluctant cooperators can enjoy the benefits of cooperation 
without fear of punishment. 

Ri is an ESS whenever punishment does not pay (i.e., [6] is not satisfied}. At 
this ESS, there is no cooperation because reluctant cooperators behave as un¬ 
conditional defectors. If the long-run benefits of cooperation to an individual are 
not sufficient to offset the cost of coercing all the other members of the group to 
cooperate, the noncooperators can resist invasion by punishing or cooperating 
strategies. Persistent noncooperation is not the only possible outcome, however, 
under this condition. If P can resist invasion by Rj (i.e., [7] is satisfied}, then 
simulation studies indicate that there may be persistent oscillations involving all 
three strategies. Such oscillations seem to require that the cost of being punished 
is much greater than the cost of punishing and the benefits of coopera¬ 

tion barely exceed the cost (l«c). 

If punishment does pay, the long-run outcome is a mix of reluctant co- 
operators who coexist with cooperator-punishers and, sometimes, easygoing 
cooperators. This can happen in three different ways: 

• There can be a stable mix of reluctant cooperators and cooperator- 
punishers. Such a stable equilibrium exists anytime there is a stable 
polymorphic equilibrium on the Rj — P boundary in the absence of 
E. If, in addition, P is not an ESS in the absence of E, this mixture of 
reluctant cooperators and cooperator-punishers is the only stable 
equilibrium, and numerical simulations suggest that the polymorphic 
equilibrium is globally stable. Thus, at equilibrium, populations will 
consist of a majority of reluctant cooperators with a minority of 
cooperator-punishers. E cannot invade because rare E individuals 
often find themselves in groups without a cooperator-punisher and 
thus pay the cost of cooperation without receiving the long-run 
benefits of cooperation. Punishers in all groups received the benefits 
of long-term cooperation. 
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• If there is no polymorphic equilibrium on the R\ — P boundary (i.e., 
in the absence of E ), then there is a single interior equilibrium point 
at which all three strategies are present. We have not been able to 
derive an expression for the frequencies of the three traits at these 
interior equilibria or determine when they are stable. Numerical 
simulation indicates that when an interior equilibrium exists, it is 
almost always stable. 

• The mixture of all three strategies can oscillate. When P is stable in 
the absence of E, the frequencies of the three strategies may oscillate 
indefinitely. Simulation studies suggest that this outcome only oc¬ 
curs under relatively rare parameter combinations. 

In each case, as group size increases, the average frequency of cooperative 
strategies typically declines to a quite low level. However, the average frequency 
of groups with at least one P individual, and therefore groups in which coop¬ 
eration occurs over the long run, can remain at substantial levels even when 
groups are large. One must keep in mind, however, that this conclusion pre¬ 
supposes that individual punishers can afford to punish every noncooperator in 
the group. A model in which the capacity to punish is limited would presumably 
stabilize at some higher frequency of punishers as group size increased. 

Moralistic Strategies 

The results of the previous section suggest that strategies that attempt to induce 
cooperation through retribution can always be invaded when they are common 
by strategies that cooperate but do not punish. However, such is not the case. 
Consider the following strategy. 

Moralists (M): Always cooperate, and punish individuals who are not 
in "good standing.” Individuals are in good standing if they have be¬ 
haved according to M since the last time they were punished or the 
beginning of the interaction. 

Thus, moralists punish individuals who do not cooperate. But they also punish 
those who do not punish noncooperators and those who do not punish non¬ 
punishers. Each M individual punishes others at most once per turn. Once an in¬ 
dividual is punished, he can avoid further punishment by cooperating, punishing 
noncooperators, and punishing nonpunishers (thus returning to good standing). 

Moralists can resist invasion by reluctant cooperators (Ri) whenever the 
following is true 

t” ■- M 1 - 1 - (1 - ■^—3)>'-- W" + d-lxi-.) C9) 

cost of being punished cost of cooperating and 

punishing 

The left-hand side of inequality (9) gives the cost to an Rj individual of being 
punished. It is proportional to the number of interactions because such reluctant 
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cooperators are punished every time there is an error. The right-hand side is the 
cost of cooperating and punishing. Thus, as long as the error rate is not exactly 
zero, moralists can resist invasion by Rj under a wider range of conditions than 
can P. 

Moralists can resist invasion by easygoing cooperators ( E ) whenever the 
following condition is satisfied: 

(1 — (1 — e) n )wp > ek (10) 

If errors occur only infrequently ( ne -C 1), then this condition simplifies to be¬ 
come nwp > k. Thus, unless punishing is much more costly than being punished, 
moralists can resist invasion by easygoing cooperators. 

In fact, as Hirshleifer and Rasmusen (1989) have pointed out, moralistic 
aggression of this kind is a recipe for stabilizing any behavior. Notice that neither 
condition (9) or (10) involves terms representing the long-run benefits of coop¬ 
eration (i.e., terms of the form b/[ 1 — w )). When M is common, rare individuals 
deviating from M are punished; otherwise, they have no effect on the behavior of 
the group. Thus, as long as being punished by all the other members of the group 
is sufficiently costly compared to the individual benefits of not behaving according 
to M, M will be evolutionarily stable. It does not matter whether or not the 
behavior produces group benefits. The moralistic strategy could require any ar¬ 
bitrary behavior—wearing a tie, being kind to animals, or eating the brains of dead 
relatives. Then M could resist invasion by individuals who refuse to engage in the 
arbitrary behavior unless punished, as long as condition (9) was satisfied (where 
c — bln is the cost of the behavior), and resist invasion by individuals who perform 
the behavior but do not punish others, as long as (10) is satisfied. 


Discussion 

Our results suggest that problems of second-order cooperation can be overcome 
in two quite different ways: first, even though retribution creates a group benefit, 
it need not be altruistic. If defectors respond to punishment by a single individual 
by cooperating, and if the long-run benefits to the individual punisher are greater 
than the costs associated with coercing other group members to cooperate, then 
the strategy that cooperates and punishes defectors can increase when rare and 
will continue to increase until an interior equilibrium is reached. At this equi¬ 
librium, the punishing strategy coexists with strategies that initially defect but 
respond to punishment by cooperating and, sometimes, strategies that cooperate 
but do not punish. For plausible parameter values, the punishing strategy is rarer 
than the other two strategies at such an equilibrium. However, since a single 
punisher is sufficient to induce cooperation, cooperating groups are nonetheless 
quite common. 

Increasing group size reduces the likelihood that this mechanism will lead to 
the evolution of cooperation because it increases the cost of coercion. This ef¬ 
fect, however, is not nearly so strong as previous models in which defection was 
punished by withdrawal of cooperation. In those models (Joshi, 1987; Boyd and 
Richerson, 1988, 1989), a linear increase in group size requires an exponential 
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increase in the expected number of interactions necessary for cooperation to 
increase when rare. In the present model, the same increase in group size re¬ 
quires only a linear increase in the expected number of interactions. 

Moralistic strategies that punish defectors, individuals who do not punish 
noncooperators, and individuals who do not punish nonpunishers can also over¬ 
come the problem of second-order cooperation. When such strategies are common, 
rare noncooperators are selected against because they are punished. Individuals 
who cooperate but do not punish are selected against because they are also pun¬ 
ished. In this way, selection may favor punishment, even though the cooperation 
that results is not sufficient to compensate individual punishers for its costs. 

It is not clear whether moralistic strategies can ever increase when rare. We 
have not presented a complete analysis of the dynamics of moralistic strategies 
because to do so in a sensible way would require the introduction of additional 
strategies, a consideration of imperfect monitoring of punishment, and a con¬ 
sideration of more general temporal patterns of interaction. We conjecture, 
however, that the dynamics will be roughly similar to the dynamics of P and R\ 
in the case in which there is no stable internal equilibrium: both defecting and 
moralistic strategies will be evolutionarily stable. Increasing the degree of as¬ 
sortment will mean that moralists will have fewer defectors to punish but will be 
punished more when they err. Assortative social interaction will not interact 
with group benefits in a way that will allow moralistic strategies to increase. 

It is also interesting that moralistic strategies stabilize any behavior. The con¬ 
ditions that determine whether M can persist when rare are independent of the 
magnitude of the group benefit created by cooperation. The moralistic strategy 
could stabilize any behavior equally well, whether it is beneficial or not. If our 
conjecture about the dynamics of M is correct, then the dynamics will not be 
strongly effected by whether or not the sanctioned behavior is group-beneficial. 

This result is reminiscent of the “folk theorem” from mathematical game 
theory. This theorem holds that in the repeated prisoner’s dilemma with a 
constant probability of termination (the case analyzed by Axelrod and most 
other evolutionary theorists], strategies leading to any pattern of behavior can be 
a game theoretic perfect equilibrium (Rasmusen 1989]. The proof of this the¬ 
orem relies on the fact that if there is enough time available (on average] for 
punishment, then individuals can be induced to adopt any pattern of behavior. 
Thus, in games without a known endpoint, game theory may predict that any¬ 
thing can happen. This result, combined with the fact that nobody lives forever, 
has led many economists to restrict their analyses to games with known end¬ 
points. The diversity of equilibria here and in the nonevolutionary analysis can be 
regarded as a flaw or embarrassment for the analysis. 

We prefer to take these results as telling us something about the evolution of 
social behavior. Games without a known endpoint seem to us to be a good 
model for many social situations. Although nobody lives forever, social groups 
often persist much longer than individuals. When they do, individuals can expect 
to be punished until their own last act. Even dying men are tried for murder, and 
in many societies one’s family is also subject to retribution. If one accepts this 
argument, then it follows that moralistic punishment is inherently diversifying in 
the sense that many different behaviors may be stabilized in exactly the same 
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environment. It may also provide the basis for stable among-group variation. 
Such stable among-group variation can allow group selection to be an important 
process (Boyd and Richerson 1985, 1990 a,b), leading to the evolution of be¬ 
haviors that increase group growth and persistence. 


Conclusion 

Cooperation enforced by retribution is strikingly different from reciprocity in 
which noncooperation is punished by withdrawal of cooperation. We think two 
features of this system are interesting and warrant further study: 

1. Cooperation may be possible in larger groups than is the case with 
reciprocity. This effect invites further study of the limitations on the 
ability of single individuals to punish and how coalitions of punishers 
might or might not be able to induce reciprocity in very large groups. 

2. In the model studied here, punishers collect private benefit by 
inducing cooperation in their group that compensates them for 
punishing, while providing a public good for reluctant cooperators. 
There are often polymorphic equilibria in which punishers are rel¬ 
atively rare, generating a simple political division of labor reminis¬ 
cent of the “big man’’ systems of New Guinea and elsewhere. This 
finding invites study of further punishment strategies. Consider, for 
example, strategies that punish but do not cooperate. Such in¬ 
dividuals might be able to coerce more reluctant cooperators than 
cooperator-punishers and therefore support cooperation in still 
larger groups. If so, such models might help explain the evolution of 
groups organized by full-time specialized, “parasitical” coercive 
agents like tribal chieftains. 

The importance of the study of retribution can hardly be underestimated. 
The evolution of political complexity in human societies over the last few 
thousand years depended fundamentally on the development of a variety of 
coercive strategies similar to those we have investigated here. 


APPENDIX 

SENSITIVITY OF THE MODEL TO THE RESPONSE TO PUNISHMENT 

The effects of punishment on the evolution of cooperation are strongly affected by 
the extent to which a defector responds to punishment by cooperating. To see this, 
consider a game in which cooperator-punishers (P) compete with the following 
nonresponsive strategy. 

Unconditional defectors (U ): Never cooperate. Never punish. 

Many of the evolutionary properties of the two-person repeated prisoner’s di¬ 
lemma can be derived considering a model in which only tit-for-tat (TFT, cooperate 
on the first move, and punish each defection by defecting) and ALLD (always defect) 
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are present. Our strategies P and U seem like the natural generalizations of TFT and 
ALLD to the n-person game with punishment, and one might (as we did) expect that 
their evolutionary dynamics would be similar. This expectation is largely incorrect. 
Understanding why provides useful insight into the evolutionary effects of punish¬ 
ment. For simplicity, we assume that there are no errors [e = 0) throughout this 
section. 

Let j be the number of the other n — 1 individuals in the group who are P. The 
expected fitness of U individuals given j is: 

cad 

Similarly, the expected fitness of P individuals given j is: 


yr p I ^ = b/n {_) + 

' 1 ' 1 -w 

The expected fitness of U individuals averaged over all groups is: 


(A2) 


W[U)=Y / mU\U)V(U\n 

i =o 

= £(;|U)^^ (A3) 

where m[j\ U) is the probability that there are j other cooperator-punishers, given 
that the focal is an unconditional defector, and E( j\ U] is the expected value of j 
conditioned on the focal individual being U. An analogous calculation shows that 

W[P] = C b/nXEU\P)+ 1)- C -(H-l-£(; |P))fe [A4 ^ 

When groups are formed at random £(/|E) = £(/|U) = (n— 1 )q where q is the fre¬ 
quency of P in the population just before groups are formed. To determine when U is 
an ESS, let q->0 and determine when W(lf) > W(P). To determine when P is an ESS, 
let <3 — »■ 1 and determine when W(U) < W(P). When groups are formed assortatively 
and P is rare, E[j\P) = (n— 1 )r and £(;|U) = 0. Combining these expressions yields 
the condition for P to increase when rare (A6). 

It follows from these expressions for the fitness of U and P that (1) unconditional 
defection is always an ESS, and (2) P is an ESS only if: 

c — b/n < (n — 1 )/> (A5) 

The left-hand side of (A5) is the per period cost to an individual of cooperating, and 
the right-hand side is the per period cost of being punished by n — 1 individuals. 

Superficially these properties seem analogous to the competition between 
always-defect and tit-for-tat in the two-person model. Always-defect is always an 
ESS; tit-for-tat is an ESS only under certain conditions. However, notice that (A5) does 
not depend on the parameter w, which measures the average number of interactions. 
Thus, if (A5) is satisfied, P is stable even if individuals interact only once! In contrast, 
tit-for-tat is stable against always-defect only if w is large enough that the long-run 
benefit of reciprocal interaction is greater than the short-term benefit of cheating. 
Tit-for-tat is never stable if individuals interact only once. 

The qualitative difference between the two models is made clearer if we con¬ 
sider the effect of assortative group formation. In the two-person case, assortative 
group formation makes it easier for tit-for-tat to increase when rare, and if w is near 
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one, even a small amount of assortment is sufficient. In the present model, the 
punishing strategy, P, can increase when rare if the following is true: 

{b/n)[r(n- 1) + 1] -c> (w- 1)(1 - r)k 
inclusive fitness punishment 

The left-hand side gives the inclusive fitness advantage of cooperators relative to 
defectors. If P individuals are sufficiently likely to interact with other P individuals 
(r—>1), then P can increase in frequency even when it is rare in the population 
because P individuals benefit from the cooperation of other P individuals in their 
groups. The right-hand side gives the effect of punishment on the fitness of P in¬ 
dividuals. Notice that this term is always positive. This means that cooperation 
supported by punishment is harder to get started in a population than unconditional 
cooperation. 

Why are these two models so different? In models without retribution, recip¬ 
rocal strategies such as tit-for-tat are favored because they lead to assortative inter¬ 
action of cooperators (Michod and Sanderson, 1985). Even if individuals are paired at 
random, the fact that tit-for-tat individuals convert to defection if they experience 
acts of defection from others, causes a nonrandom distribution of cooperative be¬ 
havior: tit-for-tat individuals are more likely to receive the benefits of cooperation 
than are always-defect individuals. In contrast, in the present model, punishment has 
no effect on who receives the benefits of cooperative behavior. P individuals continue 
to cooperate while they punish, and U individuals do not respond to punishment by 
cooperating—they keep defecting. Models of reciprocity without punishment suggest 
that the strategy of punishing defectors by withdrawing cooperation is unlikely to 
work in large groups (Joshi, 1987; Boyd and Richerson, 1988). However, it is not 
unreasonable to imagine that a kind of conditional defector might respond to pun¬ 
ishment by cooperating much as tit-for-tat responds to cooperation with more co¬ 
operation. 

SHOULD DEFECTORS RESPOND TO PUNISHMENT? 

Should defecting individuals respond to punishment by cooperating? To address this 
question, we consider the conditions in which R\ can invade a population in which 
the strategy U is common. We further assume that groups are formed at random. 

Unfortunately, the answer to this question does not depend on the fitness 
consequences of alternative behaviors alone. It also depends on what kinds of pun¬ 
ishing strategies are maintained in the population by nonadaptive processes like 
mutation and nonheritable environmental variation. In a population in which only U 
and R\ are present (and every individual accurately follows its strategy), U and Rj will 
have the same expected fitness. Both will defect forever and never be punished 
because no punishing strategies are present. The strategies U and R\ will have dif¬ 
ferent expected fitnesses only if there are punishing strategies present in the popu¬ 
lation. If U is common, however, the expected fitness of any rare punishing strategy 
must be less than the expected fitness of U. This means that any punishing strategies 
present in the population must be maintained by nonadaptive processes like errors or 
mutation. R\ may or may not be able to invade, depending on the mix of punishing 
strategies maintained by such forces. 

We conjecture that the most plausible source of nonadaptive variation is mistakes 
about the behavioral context. Modelers typically assume that there is a single be¬ 
havioral context, with given costs and benefits, and an unambiguous set of behavioral 
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strategies. However, in the real world, there are many behavioral contexts, each with 
its own appropriate strategy. Before deciding how to behave, individuals must cate¬ 
gorize a particular situation as belonging to one context or another. It seems plausible 
that individuals sometimes miscategorize situations in which punishment is not fa¬ 
vored and thus mistakenly punish others. Suppose, for example, selection favors 
individual retaliation if others damage personal property. Then individuals might 
sometimes punish others who damage commonly held property because they mis¬ 
takenly miscategorize the behavior. 

To prove that Rj may or may not be able to invade U, consider the second 
punishing strategy. 

Timid punishers (Tjj: Always cooperate. Punish each defector the first 

time it defects, but only the first time. 

Suppose that both U and R occasionally mistakenly play one of the punishing 
strategies. This could occur because individuals mistake the behavioral context for 
one in which they would normally punish. The relative fitness of U and R\ depends 
on which of these two punishing strategies is present. Suppose that individuals oc¬ 
casionally play T\ by mistake. Rj can invade if a focal Rj individual has higher fitness 
than a focal U individual in groups with one T\ individual among the other n— 1. In 
such groups, 



w w-?" w -t 

r —•» 



> 

00 

Thus, 

result 

U is always favored if cooperation is costly. In contrast, when P is present as a 
of errors, the fitnesses of the two types are as follows: 


it) 

W{U) = b/n — p + -( h/n—p ) 

r — •» 


W(Ri] = h/n — p + ——— (2 b/n — c } 

(A10) 


Thus, Ri is favored whenever the costs of punishment exceed the cost of 
cooperating. 

We think that this result is likely to be quite general. Consider a strategy that 
begins cooperating only after being punished some number of times. Such a strategy 
will have higher fitness than an unresponsive strategy only if the punishing strategies 
present in the population continue to punish on subsequent turns. If they do not, the 
unresponsive strategy gets the benefit without paying the cost. When should pun¬ 
ishing strategies give up? The answer to this question depends on whether the de¬ 
fecting strategies will respond. If defecting strategies are unresponsive, costly 
punishment provides no benefits. 


EQUILIBRIA WHEN R, AND P COMPETE 

Let j be the number of P individuals among the other n— 1 individuals in a group. 
Then the expected fitnesses of the two types are: 
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W(F) = (1 - e)[{b/n){_EU | P) + 1) - c] - k[n - 1 - (1 - e)E{j \ P)] 
~ epE[j\p ] + y^[(l - e)(h - c) - ek{n - 1] - epE{j | P)] 

W(R 1 )=[[b/nXl-e)-p]E[j\R 1 ) 

+ t^[(1 -eXb-c)-epEU\Ri)] 

V — IV 


i *'Pr(;' = 0|R!) 

1 — w 


(1 -e](fc-c) 


(All) 


(A12) 


where Pr(;' = 0|Ri) is the probability that an Rj individual finds himself in a group 
with exactly zero P individuals. 

When groups are formed at random, E[j\P) = E[j\R{) = (n — l)q and Pr(;' = 
OlRi) = (1 — q ) n_1 , where q is the frequency of P. Making these substitutions leads to 
the following condition for Rj to increase: 

w 


{k + p)[n- 1)(1 -q] - 
— b/n + c — p(n — 1) 


1 — w 


(i - q y-'(b-c) 
ek[n — 1) 


>0 


(A13) 


(1 - w)(l — e) 

The condition for Rj to be an ESS (7) is derived by setting q = 0 in (A13). The 
condition for P to be an ESS (6) is derived by setting q= 1 in (A13). 

To derive the necessary conditions for a stable internal equilibrium, first notice 
that the left-hand side of (A13) is a concave function with, at most, a single internal 
maximum. Thus, if neither Rj or P is an ESS, then there is a single internal equi¬ 
librium point. If Ri is not an ESS but P is, then there are two internal equilibria, one 
stable and the other unstable, if, and only if, the value of the left-hand side at that 
maximum is greater than zero. The value of q that maximizes the left-hand side of 
(A13) can be found by differentiation. Substituting this back into (A13) yields the 
following necessary condition for the existence of two internal equilibria: 

P )V /C ” 23 


(1 - w)(k + 
w{b — c) 

p[n — 1) — c + b/n — 


(n - 2)[k + p)> 
ek(n — 1) 


(A14) 


(1 - w)(l - e) 

If this condition is not satisfied, then P is the only ESS. 

To derive the condition for Rj to increase when groups are formed assortatively, let 
E{j\P) = {n-l)(r + (1 — r)q ) and E[j\R{) = (n — 1)(1 — r)q and proceed in the same way. 


EQUILIBRIA WHEN R,, £, AND P COMPETE 

Let i and j be the numbers of E and P individuals among the other n — 1 individuals. 
Here is the equation: 

W(F) = (1 - e)[{b/n){Em + E(j \ P) + 1) - c] 

- k [n - 1 - (1 - e)(£(i|P) + E(j \ P))] - epE{j \ P ) 

+ — [(1 - e)(b - c) - ek[n - 1} - epE[j \ P)] (A15) 

1 — w 

W{R x ) = {b/n){\ -eXEWi) + £(;|Ri)) - pE{j\R{) 

IV 

+ p r(; > 0IR0I0 - e]{b - c) - epE{j\R, kj > 0)] 

IV 

+ -e)(b/n)Prti=0\R,)E(i\R 1 kj=0) (A16) 
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M/(£) = (1 - e)[{b/n)[_Em + E{_j\E) + 1) - c] - e P E[j\E ) 

+ y^Prlj > ^K 1 ~ e ^ b ~ c ^- epW&j > 0)] 

+ y^( 1 -e)PrO' = 0|£)[(b/n)t£(i|£&/ = 0) + 1) - c\ (A17) 


Assume that groups are formed at random so that £(/'|£) =£()|P) =£(;|Ri) = 
(n - \)q P , E(i\E) = E(i\P) = E(i\Ri) = (» - 1 )<& Pr(j= 0|£) = Pr(; = 0|Rj) = (1 -q P f n ~'\ 
and £(i|Ri&; = 0) = £(i|£&;’ = 0) = (n— 1)(<Je/( 1 — qp)) where q E and q P are the fre¬ 
quencies of £ and P. When q E = 1, TV(£) < VK(Ri) and when qp= 1, VK(R) < VKfR). 

First, we derive conditions for the existence of an internal equilibrium and show 
that if such an equilibrium exists, it is unique. 

It is useful to define the following functions, which give the difference in fitness 
between each pair of strategies as a function of q P and q E : 

dp E {qp,q E ) = W{P) - W(E) (A 18) 

d RE (qp,q E ) = W{R 1 ) - W{E) (A 19) 

dp R (q P ,q E ) = W(F) - W(Ri) (A20) 

Using equations (A15), (A16), and (A17) and the assumption of random group 
formation yields the following expression for 


dR E {qp, qp] = - p(l - e)[n - 1 )qp 

+ (1 - e)[c - b/n)[ 1 + (1 - qpf- 1 ) (A21) 

Notice that the relative fitness of Rj and £ depends only on q P . Further, note that (1) 
dpE^ 0, q E ) > 0; (2) d RE (1, q E ) < 0 as long as c— bln < (n — 1 )p, which is true by 
assumption; and (3) is a monotonically decreasing function of q P . Thus, the value 
of q P at equilibrium is unique and can be found by finding the root of d rp = 0 as 
shown in figure Al. Let this value of q P be q P . 

Once again, using equations (A15), (A16), and (A17) and the assumption of 
random group formation yields the following expression for d PE : 

— ke(n — 1") 

dp E {qp,q E ) = — 1 _ w + ^(1 - e )( n - 1)(1 ~ q E ~ qp) 

wb[n - 1)(1 - e)(l - qp) n ~ 2 [ 1 - qp - q E ) 
m(1 — w) 

Assume that q E is fixed at some value. Then 


, r , , -ke{n -\) 

QE>QEj ^ ^ 

1 - W 

and 

dp E [0,q E ) = + (1 - ®)(1 - e)(n - 1)(^^ - fe) 

Thus, d PE (0,q E ) > 0 if 

ke 


(A22) 


q E < 1 


(1 — e)(w(b/n) — k{\ — w)) 
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Figure Al. This figure illustrates the logic of the proofs given in this section. The left-hand 
pair of figures represents a situation in which there is a single polymorphic equilibrium 
on the Ri — P boundary. The lower figure shows dp£{qp,0] and dpp[qp, 0). These curves 
intersect only once since there is a single polymorphic equilibrium. Thus, we know that 
q'p < q P . The upper figure shows how the forms of dpp[qp,qp) and dpp(iqp,qE) guarantee 
that there is no internal equilibrium in this case. The right-hand pair of figures represents the 
situation in which there is no polymorphic equilibrium on the Rj — P boundary 
because P increases for all values of qp< 1. 


and 


w{b/n) — fe(l — w) > 0 (A23) 

Otherwise, dpp < 0. Differentiating shows that dpp is a convex function of qp. Thus, 
if (A22) and (A23) are satisfied, dpp[qp,qE } = 0 has a unique root for each qp as 
illustrated in figure Al. Let this root be q' P [qE )■ Increasing qp leads to a decrease in 
q'p[qp^]. Thus, there is a internal equilibrium value if, and only if, q' P [0 ) > qp, and if it 
exists, such an equilibrium is unique. This result is shown graphically in figure Al. 

We know that dRE^qp) is monotonically decreasing and has one root in the 
interval (0, 1) whenever R\ is potentially present, and that dpE^qp) has at most one 
root and is monotonically decreasing in the interval that contains the root. 

Next, we show that if there is no stable polymorphic equilibrium on the P — Rj 
boundary in the absence of E, then there is an internal equilibrium. If there is no 
stable equilibrium on the boundary in the absence of E, it follows from the results of 
the previous section that 


dm{qp, 0) > 0 
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for all q P Next, note that 

dpR^qp.qE ) = dpE{qp,qE ) — dRE{qp,qE ) (A24) 

Thus, there is an internal equilibrium since dppiqp, 0) > dpEiflP, 0] for all values of q P . 
This situation is shown in the right-hand pair of figures in figure Al. 

Next, we show that if there is a stable, polymorphic boundary equilibrium such 
that qE = 0 and W(R \) > W(P] for q P = 1, then there is no internal equilibrium. Let q P 
be the frequency of P at a polymorphic equilibrium on the P — R\ boundary. Then 
dpR^qp, 0) = 0, which implies that dpE^qp, 0) = dRE[qp, 0). The fact that the equilib¬ 
rium is stable in the absence of E implies that dd P R[q P ,0)/dq P < 0 at qp. Since 
dmi 1,0) < 0, it follows that d PP iq p , 0) < dRp[q P , 0] for qp < qp < 1. But this means 
that qp[ 0) < qp, and, therefore, there is no internal equilibrium as shown in the left- 
hand pair of figures in figure Al. 

It is important to note that there may be no internal equilibrium even if 
VK(Ri) < W(P ). When this is the case, there is a second, unstable internal equilibrium 
on the Rj — P boundary. Anytime that d PP = d PP < 0 at this equilibrium, there will be 
no internal equilibrium, and numerical studies suggest that this is what actually 
occurs at the vast majority of parameter combinations. 


M IS AN ESS AGAINST PAND E 

Assume that M is common. When groups are formed at random, M can resist in¬ 
vasion by rare R\ individuals if the average payoff of M in groups with n — 1 other M 
individuals, V(M\n— 1) is greater than the average payoff of Rj in groups in which 
the other n — 1 individuals are M, V(R\\n — 1): 

V(M\n- 1)= y- 1 -^ ((fi-c)(l — e) — e(n — l)(fe + p )) (A25) 

F(R] \n — 1] = (n — l](fc/n)(l — e) — (n — Y]p 

+ ^ [(1 - e){b - c) - p{n - 1)(1 - (1 - ef )] (A26) 

1 — w 

Substituting these expressions and simplifying yields condition (9). Similarly, the 
expected fitness of an E individual in a group of n— 1 M individuals, ViE\n— 1), is: 

V(E\n — 1) = (1 — e](fc — c) — e(n — 1 ]p 
111 

- e){b - c) - e{n - 1 )p - pin - 1)(1 - (1 - c )")] (A27) 

This expression is used to determine when F(Mln — 1) > ViE\n— 1) yields equation (10], 

NOTE 

We thank Alan Rogers for useful comments and for carefully checking every result in 
this chapter. Joel Peck also provided helpful comments. 
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lO Why People Punish Defectors 

Weak Conformist Transmission 
Can Stabilize Costly Enforcement 
of Norms in Cooperative 
Dilemmas 

With Joseph Henrich 

In many societies, humans cooperate in large groups of unrelated 
individuals. Most evolutionary explanations for cooperation combine kinship 
(Hamilton, 1964} and reciprocity ("reciprocal altruism,” Trivers, 1971}. These 
mechanisms seem to explain the evolution of cooperation in many species in¬ 
cluding ants, bees, naked mole rats, and vampire bats. However, because social 
interaction among humans often involves large groups of mostly unrelated indi¬ 
viduals, explaining cooperation has proved a tricky problem for both evolution¬ 
ary and rational choice theorists. Evolutionary models of cooperation using the 
repeated n-person prisoner’s dilemma predict that cooperation is not likely to be 
favored by natural selection if groups are larger than around 10, unless related¬ 
ness is very high (Boyd and Richerson, 1988}. As group size rises above 10, to 
100 or 1000, cooperation is virtually impossible to evolve or maintain with only 
reciprocity and kinship. 1 

Many students of human behavior believe that large-scale human coopera¬ 
tion is maintained by the threat of punishment. From this view, cooperation 
persists because the penalties for failing to cooperate are sufficiently large that 
defection “doesn’t pay.” However, explaining cooperation in this way leads to 
a new problem: why do people punish noncooperators? If the private benefits 
derived from punishing are greater than the costs of administering it, punishment 
may initially increase but cannot exceed a modest frequency (Boyd and Richerson, 
1992}. Individuals who punish defectors provide a public good, and thus can be 
exploited by nonpunishing cooperators if punishment is costly. Second-order free 
riders cooperate in the main activity but cheat when it comes time to punish 
noncooperators. As a consequence, second-order free riders receive higher pay¬ 
offs than punishers do, and thus punishment is not evolutionarily stable. Adding 
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third- (third-order punishers punish second-order free riders) or higher-order 
punishers only pushes the problem back to higher orders. Solving this problem 
is important because there is widespread agreement that the threat of punish¬ 
ment plays an important role in the maintenance of cooperation in many human 
societies. 

Social scientists have explained the maintenance of punishment in three 
ways: (1) many authors assume that a state or some other external institution 
does the punishing; (2) others assume punishing is costless (McAdams, 1997; 
Hirshleifer and Rasmussen, 1989); and (3) a few scholars incorporate a recursive 
punishing method in which punishers punish defectors, individuals who fail to 
punish defectors, individuals who fail to punish nonpunishers, and so on in an 
infinite regress (Boyd and Richerson, 1992; Fundenberg and Maskin, 1986). 
However, none of these solutions is satisfactory. While it is useful to assume 
institutional enforcement in modern contexts, it leaves the evolution and 
maintenance of punishment unexplained because at some point in the past there 
were no states or institutions. Furthermore, the state plays a very small role 
in many contemporary small-scale societies that nonetheless exhibit a great deal 
of cooperative behavior. This solution avoids the problem of punishment by 
relocating the costs of punishment outside the problem. The second solution, 
instead of relocating the costs, assumes that punishment is costless. This seems 
unrealistic because any attempt to inflict costs on another must be accompanied 
by at least some tiny cost—and any nonzero cost lands both genetic evolutionary 
and rational choice approaches back on the horns of the original punishment 
dilemma. The third solution, pushing the cost of punishment out to infinity, also 
seems unrealistic. Do people really punish people who fail to punish other 
nonpunishers, and do people punish people who fail to punish people, who fail 
to punish nonpunishers of defectors and so on, ad infinitum? Although the 
infinite recursion is cogent, it seems like a mathematical trick. 


Conformist Transmission in Social Learning 
Can Stabilize Punishment 

In this chapter, we argue that the evolution of cooperation and punishment 
are plausibly a side effect of a tendency to adopt common behaviors during 
enculturation. Humans are unique among primates in that they acquire much of 
their behavior from other humans via social learning. However, both theory and 
evidence suggest that humans do not simply copy their parents, nor do they copy 
other individuals at random (Henrich and Boyd, 1998; Takahasi, 1998; Harris, 
1998). Instead, people seem to use social learning rules like “copy the success¬ 
ful’ ' (termed pay-off biased or prestige-biased transmission; see Henrich and Gil- 
White, 2001) and “copy the majority” (termed conformist transmission; Boyd 
and Richerson, 1985; Henrich and Boyd, 1998), which allow them to shortcut 
the costs of individual learning and experimentation and leapfrog directly to 
adaptive behaviors. These specialized social learning mechanisms provide a gen¬ 
eralized means of rapidly sifting through the wash of information available in the 
social world and inexpensively extracting adaptive behaviors. These social 
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learning shortcuts do not always result in the best behaviors, nor do they prevent 
the acquisition of maladaptive behaviors. Nevertheless, when averaged over 
many environments and behavioral domains (e.g., foraging, hunting, social in¬ 
teraction, etc.), these cultural transmission mechanisms provide fast and frugal 
means to acquire complex, highly adaptive behavioral repertoires. 

Both theoretical and empirical research indicates that conformist transmis¬ 
sion plays an important role in human social learning. We have already shown 
that a heavy reliance on conformist transmission outcompetes both unbiased 
(i.e., vertical) transmission and individual learning under a wide range of con¬ 
ditions (H enrich and Boyd, 1998), and especially when problems are difficult. 
Second, empirical research by psychologists, economists, and sociologists shows 
that people are likely to adopt common behaviors across a wide range of decision 
domains. Although much of this work focuses on easy perceptual tasks (Asch, 
1951) and confounds normative conformity (going with the popular choice to 
avoid appearing deviant) with conformist transmission (using the popularity of a 
choice as an indirect measure of its worth), more recent work shows that social 
learning and conformist transmission are important in difficult individual prob¬ 
lems (Baron, Vandello, and Brunsman, 1996; Insko, Smith, Alicko, Wade, and 
Taylor, 1985; Campbell and Fairey, 1989), voting situations (Wit, 1999), and 
cooperative dilemmas (Smith and Bell, 1994). 

Conformist transmission can stabilize costly cooperation without punish¬ 
ment but only if it is very strong. All other things being equal, payoff-biased 
transmission causes higher payoff variants to increase in frequency, and thus 
cooperation is not evolutionarily stable under plausible conditions—because not 
cooperating leads to higher payoffs than cooperating. Thus, payoff-biased 
transmission, alone, suffers the same problem as natural selection in genetic 
evolution. However, under conformist transmission individuals preferentially 
adopt common behaviors and thus increase the frequency of the most common 
behavior in the population. Thus, if cooperation is common, conformist trans¬ 
mission will oppose payoff-biased transmission and, as long as cooperation is not 
too costly, maintain cooperative strategies in the population. However, if the 
costs of cooperation are substantial, it is less likely that conformist transmission 
will be able to maintain cooperation. 

A quite different logic applies to the maintenance of punishment. Suppose 
that both punishers and cooperators are common and that being punished is 
costly enough that cooperators have higher payoffs than defectors. Rare invading 
second-order free riders who cooperate but do not punish will achieve higher 
payoffs than punishers because they avoid the costs of punishing. However, 
because defection does not pay, the only defections will be due to rare mistakes, 
and thus the difference between the payoffs of punishers and second-order free 
riders will be relatively small. Hence, conformist transmission is more likely to 
stabilize the punishment of noncooperators than cooperation itself. As we as¬ 
cend to higher-order punishing, the difference between the payoffs to punishing 
versus nonpunishing decreases geometrically toward zero because the occasions 
that require the administration of punishment become increasingly rare. Second- 
order punishing is required only if someone erroneously fails to cooperate, and 
then someone else erroneously fails to punish that mistake. For third-order 
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punishment to be necessary, yet another failure to punish must occur. As the 
number of punishing stages (i) increases, conformist transmission, no matter 
how weak, will at some stage overpower payoff-biased imitation and stabilize 
common i-th order punishment. Once punishment is stable at the i-th stage, 
payoffs will favor strategies that punish at the (i — l)-th order, because common 
punishers at the i-th order will punish nonpunishers at stage i — 1. Stable pun¬ 
ishment at stage (i — 1) means payoffs at stage i — 2 will favor punishing strate¬ 
gies, and so on down the cascade of punishment. Eventually, common first-order 
punishers will stabilize cooperation at stage 0. 

It is important to see that the stabilization of punishment is, from the gene’s 
point of view, a maladaptive side effect of conformist transmission. If there were 
genetic variability in the strength of conformist transmission (a) and cooperative 
dilemmas were the only problem humans faced, then conformist transmission 
might never evolve. However, human social learning mechanisms were selected 
for their capability to efficiently acquire adaptive behaviors over a wide range 
of behavioral domains and environmental circumstances—from figuring out 
what foods to eat, to deciding what kind of person to marry—precisely because 
it is costly for individuals to determine the best behavior. Hence, we should ex¬ 
pect conformist transmission to be important in cooperation as long as distin¬ 
guishing cooperative dilemmas from other kinds of problems is difficult, costly, 
or error-prone. Looking across human societies, we find that cooperative dilem¬ 
mas come in an immense variety of forms, including harvest rituals among ag¬ 
riculturalists, barbasco fishing among Amazonian peoples, warfare, irrigation 
projects, taxes, voting, meat sharing, and anti-smoking pressure in public places. 
It is difficult to imagine a cognitive mechanism capable of distinguishing coop¬ 
erative circumstances from the myriad of other problems and social interactions 
that people encounter. 

In what is to come, we formalize this argument. Our goal is to demonstrate 
the soundness of our reasoning and show how very weak conformist transmission 
can stabilize cooperation and punishment. After demonstrating this, we will de¬ 
scribe how cooperation, once it is stabilized in one group, can spread across many 
populations via cultural group selection. We will also briefly show how genes for 
prosocial behavior may eventually spread in the wake of cultural evolution. 


A Cultural Evolutionary Model of Cooperation and Punishment 

In this model, a large number of groups, each consisting of N individuals, are 
drawn at random from a very large population. Individuals within each group 
interact with one another in an i+ 1 stage game. The first stage is a one-shot 
cooperative dilemma, which is followed by i stages in which individuals can 
punish others. We number the first, cooperative stage as 0 and the punishment 
stages as 1,..., i. The behavior of individuals during each stage is determined by 
a separate culturally acquired trait with two variants, P (prosocial variant) and 
NP (not prosocial variant). 

During the initial cooperative dilemma, individuals can either “cooperate”— 
contribute to a public good—or “defect”—not contribute and free-ride on the 
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contributions of others. Each cooperator pays a cost C to contribute a benefit B 
(.B > C) to the group—this B is divided equally among all group members. 
Defectors do not pay the cost of cooperation (C) but do share equally in the total 
benefits. The variable p 0 represents the frequency of individuals in the popula¬ 
tion with the cooperative variant in stage 0. People with the cooperative variant 
“intend” to cooperate but mistakenly defect with probability e. Individuals who 
have the defecting variant always defect. This makes sense because, in the real 
world, people may intend to cooperate but fail to for some reason. For example, 
a friend who plans to help you move may forget to show up or have car trouble 
en route. Defectors, however, are unlikely to mistakenly show up on moving day 
and start carrying boxes. We will assume errors are rare, so the value of e is small. 

During the first punishment stage, individuals can punish those who de¬ 
fected during the cooperation stage. Doing this reduces the payoff of the in¬ 
dividuals who are punished by an amount p, at a cost of cj) to the punisher 
[4> < p < C). Individuals with the punishing (P) variant at this stage intend to 
punish but mistakenly fail to punish with probability e. Nonpunishers, those 
with the NP-variant at stage 1, do nothing. We use pi to stand for the frequency 
of first-stage punishers (i.e., individuals who have the P-variant at stage 1), and 
(1 — pi) gives the frequency of first-stage free riders. 

During the second punishment stage, individuals with the P-variant punish 
those who did not punish the noncooperators during the previous stage with 
probability (1 — e] and mistakenly fail to punish with probability e. And as be¬ 
fore, punishment costs punishers 4> to administer and costs those being punished 
an amount p. Those with the NP-variant at stage 2 do not punish. Letp 2 be the 
frequency of second-stage punishers. At stage 3, individuals with the P-variant 
will punish individuals from stage 2 who failed to punish nonpunishers from 
stage 1. The costs of punishment remain the same. Those with the NP-variant 
in stage 3 will not punish anyone from stage 2. The pattern repeats as one 
descends to stage i in table 10.1 (pj gives the frequency of punishers at stage 1 ). 
Because the interaction ends after stage i, individuals who fail to punish on stage 
i cannot be punished. Note that the trait that controls individual behavior at each 
stage has only two variants, and the values of variants at different stages are 


Table 

10.1. Dichotomous 

traits for cooperation and 

punishment 


Frequency of 



Stage 

P-variant 

P-variant 

NP-variant 

0 

Po 

Cooperate 

Defect 

1 

Pi 

Punish defectors 

Do not punish defectors 

2 

P 2 

Punish nonpunishers at 

Do not punish nonpunishers at 



stage 1 

stage 1 

3 

P3 

Punish nonpunishers at 

Do not punish nonpunishers at 



stage 2 

stage 2 

i 

Pi 

Punish nonpunishers at 

Do not punish nonpunishers at 



stage i — 1 

stage i — 1 
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independent—so an individual could cooperate at stage 0 (have the P-variant), 
not punish at stage 1 (NP-variant), and punish at stage 2 (P-variant}. 

After all the punishments are complete, cultural transmission takes place. 
As we explained earlier, two components of human cognition create forces that 
change the frequency of the different variants: payoff-biased and conformist- 
biased imitation. Equation (1) gives the change in the frequency of stage 1 
cooperators as a consequence of payoff-biased and conformist transmission (see 
Henrich, 1999). 

Apo = po[\ -po) [(1 - cc)P(bc ~ b D ) + ct.{2p 0 - 1)} (1} 

Payoff — biased Conformist 

The parameter a varies from 0 to 1 and represents the strength of conformist 
transmission in human psychology relative to payoff-biased transmission. We will 
generally assume a is positive but small. Practically speaking, a must be less than 
0.50, because otherwise beneficial variants would never spread—once a variant 
became common, it would remain common no matter how deleterious. The 
second term in equation (1], labeled “conformist,” varies in magnitude from —a 
to +a and is the component of the overall bias contributed by conformist 
transmission. In the term labeled “payoff-biased,” the symbols be and bo are the 
payoffs to cooperators and defectors, respectively. The quantity (be — bo), which 
we label A b 0l gives the difference in payoffs between cooperation (P-variant) and 
defection (NP-variant) in stage 0. More generally, Ah, is the difference in payoffs 
between the P- and NP-variants during the z-th stage. The parameter (I nor¬ 
malizes the quantity A b t so that it varies between —1 and +1, and therefore 
j8= \/\Abj\ max . Thus, the term labeled “payoff-biased” varies between —(1 — a) 
and +(1 — a) and represents the component of the overall bias contributed by 
payoff-biased transmission. 

The expected payoffs, b, to the P- and NP-variant at each stage depend on 
the rate of errors, the costs of cooperation and punishment, and the frequency of 
cooperators and punishers in the population. At stage 0, cooperators receive an 
average payoff of fi c , while defectors receive an average payoff of bo- 

be = (1 - e)(p 0 B( 1 - e) - C + e(p 0 B - Npip)), 
b D = (\ -e)(p 0 B-Npu:>), (2) 

Abo = be ~ b D = (1 - e)(Np\ (1 - e)p - C) 

Also as we mentioned, the term A b 0 gives the difference in payoffs between the 
two variants that control stage 0 behavior. 

A Heuristic Analysis 

Let us first analyse equation (1) by asking under what conditions will trans¬ 
mission favor cooperation (A p 0 > 0) in the absence of stage 1 punishers (pi = 0). 
In this case, Abo=— C(1 — e), which is always negative; hence, payoff-biased 
transmission never favors cooperation in the absence of punishment. So, to give 
cooperation its best chance, we assume that by some stochastic fluctuations, the 
frequency of cooperators ends up near one. How big does a. have to be so that 
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conformist transmission overpowers payoff-biased transmission and increases the 
frequency of cooperators? The frequency of cooperators increases when 

“° > 1 + - e) (3] 

where a* (here, i = 0) is the minimum value of a that favors the spread or main¬ 
tenance of the P-variant at stage i (Ap,- > 0]. With no punishment, fl { = \/\Abi\ max 
means /? 0 = 1/(C(1 — ej). As a consequence, a 0 must be greater than 0.50, and 
as we mentioned earlier, a* > 0.50 seems extremely unlikely because such 
high values would prevent the diffusion of novel practices—cultures would be 
entirely static (see Henrich, 2001], Hence, conformist transmission, operating 
directly on cooperative strategies, is unlikely to maintain cooperation in the ab¬ 
sence of punishment. 

Now, let us examine the conditions under which first-stage punishment will 
increase in frequency. Again, the change in the frequency of first-stage punishers, 
Api, is affected by both payoff-biased and conformist transmission: 

Api = pi(1 -pi)[(l - <x)P(bp\ ~ b NP i) + a(2pi - 1)] (4) 

The payoffs (bs) to punishment and nonpunishment depend on the cost of 
punishing (0) and of being punished (p), as well as the chance of mistakenly not 
punishing (e). The subscript PI indicates the P-variant at stage 1, while NP1 
indicates the NP-variant at stage 1. 

bp\ = -(1 - e)N4>[ 1 -p 0 +poe) - eNp 2 p[ 1 - <?), 
bNPi = -Np 2 (l ~ e)p, ( 5 ) 

A hi = b P] - b NP ] = —N(1 - e)(0(l - (1 - e)po) - pi{\ - e)p ) 

Assuming that there is only one punishment stage (z= 1), and that cooperators 
and stage 1 punishers are initially common (po=l and p\ = 1], then A b\ = 
—N(1 — If errors are rare enough that terms involving e 2 are negligible, 
then Abi es —Necf>. Thus, the difference in payoff between the P-variant and 
the NP-variants at stage 1 is just the cost of punishing cooperators who make 
errors. If e < (1/N), which is plausible unless groups are very large, then A b\ is 
less than cj) —and smaller than A bo because cf> < p < C. Note that, when i > 0, 

P = 1/(N(1 — e)(p( 1 —e) + so the threshold value of a necessary to stabilize 

• 2 

cooperation in a two-stage game oti, is: 


CL\ 


4>e ecj) 

p(l - e] + 2(pe p 


( 6 ) 


Equation (6) tells us that sq depends only on the error rate and the ratio of the 
cost of punishing to the cost of being punished. It also says that unless punishing 
is much more costly than being punished ( 2cf>e > p), the threshold strength of 
conformism necessary to maintain first-stage punishment is small and less than the 
amount of conformism necessary to stabilize 0-th stage cooperation (ao > ai w e). 

If we do the same analysis for stage 2, we get the following expressions for 
Ap 2 and Abz'- 


Ap2 = P2(l -T2][(l - a)pAb 2 + a(2p 2 - 1)] 


(7) 
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where 


A £>2 = bp 2 — bpjP 2 = — (1 — e)N[<j)[ 1 - p\ (1 — e)) 


The first term inside the square brackets in equation [8] is proportional to the 
number of individuals who did not punish during stage 1 (1 — pi(l — e)) and 
to the probability that there was at least one defector during stage 0: (1 — 
p^(l — e) N ). The quantity p 0 ( 1 — e] is the expected frequency of cooperators 
who did not make a mistake; thus, (p 0 (l — e)) N gives the probability that a group 
contains all cooperators who did not make a mistake—so, to get the probability 
that a group contains at least one defector, we simply subtract this probability 
from one. The second term inside the brackets is the cost of being punished 
during stage 2 for failing to punish during stage 1. If no third-stage punishers 
exist (p 3 = 0], and first-stage punishers and cooperators are initially very com¬ 
mon, then Ab 2 ~ —{eN) z 4>. Note, the difference in payoffs, Afi 2 , is a factor of eN 
smaller than A b\, but the strength of conformist transmission remains constant. 
Calculating the required size of a 2 we get: 


a 2 


N<t>e e<t> 

— ---- w —Ne 

p(l - ej + etp p 


( 9 ) 


Equation (9] demonstrates that 0 < a 2 < cq < ao = In this case a 2 wNeoq. 

If we repeat this calculation for games with more punishment stages, we find 
that, although punishment during the last stage of the game is never favored by 
payoff-biased transmission alone, any positive amount of conformist transmission 
(a > 0) will, for some finite number of stages, overcome payoff-biased trans¬ 
mission and stabilize punishment. For any value i (i > 0), the amount of con¬ 
formist transmission required to stabilize punishment at the i-th stage is: 


4>e(Ney~ l 

p(l - e] + ecj)[ 1 + (Ne)' -1 ) 



( 10 ) 


Equation (10) shows that the minimum amount of conformism necessary to 
stabilize punishment during the last stage, cl,, gets smaller and smaller for greater 
values of i (assuming e < 1/N). 

Once conformist transmission overcomes payoff-biased transmission and 
stabilizes punishment at stage i, punishment at the stage i — 1 will be stabilized 
because nonpunishers at stage i — 1 will be punished by frequent punishers 
during stage i. Once punishing strategies are common and stable at stage i— 1, 
frequent punishers at z — 1 will cause payoff-biased transmission to favor the 
prosocial variant at stage i — 2. In most cases, a combination of punishment and 
conformist transmission will eventually stabilize cooperation at stage 0. 
However, if C is sufficiently greater than Np(l — e ), then stable punishment at 
stage 1 will not be able to overcome the costs of cooperation at stage 0, and 
cooperation will not be maintained, despite stable, high-frequency first-stage 
punishers. 
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Formal Stability Analysis 

A more rigorous local stability analysis of the complete set of recursions supports 
the heuristic argument just given. Consider the set of i + 1 difference equations 
where Apj[j = 0, 1,..., i; see the Appendix) provides the dynamics of the be¬ 
havioral traits at each stage. The cooperative equilibrium point (po=l, 
p\ = 1,..., pi = 1) is locally stable under two distinct conditions: 

Stability Condition 1 

When i > 0 and C < p( 1 — e)N + ( eNy4> , the cooperative equilibrium is locally 
stable when: 


2 0 = -« + (1 - e)(l ~ a]P4>{Nej < 0 (11) 


where f}=\/{N[\ — e)((l — e) + e<f>')'). First, note that if a = 0, the cooperative 
equilibrium is never stable because all the parameters involved are always pos¬ 
itive. However, as long as a is positive and e < 1/N, then the system of equations 
will be stable for some finite value of i. Substituting in the value of jff, and solving 
equation (11) for a, we find that the minimum value of a is: 


ecj)[Ney- 1 

p(l - e) + + (Ne)' _1 ) 


( 12 ) 


which is the same value, given in equation (10), derived using a less formal 
argument. 


Stability Condition 2 

However, if C > p(l — e)N + (eN)‘0 and i > 0, then the cooperative equilibrium 
is stable when: 


A 0 = - a + (1 - a )(l - e)/j(C - (1 - e)Np) <0 (13) 

If we then solve this for the values of a that create a stable cooperative equi¬ 
librium, we find: 


/?(l- e )(C-(l-e)Np) 

’ l+j8(l-c)(C-(l -e)Np) 

Under stability condition 2, /?= 1/(C(1 — ej), so: 3 

1 _ [ Np [ 1 - e)/C] 

‘ 2 - [Np[ 1 - e)/C\ 


(14) 


(15) 


The term Np( 1 — e)/C is always between zero and one, so the required a is 
always less than This means that, even when the expected costs of being 
punished by everyone does not exceed the cost of cooperation (or the cost saved 
by defecting), the cooperative equilibrium can still be favored. Intuitively, this is 
the case in which conformist transmission and punishment combine to overcome 
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the cost of cooperation. As with the previous condition, however, it is con¬ 
formist transmission that stabilizes i-th stage punishment, which stabilizes first- 
stage punishment. 

At first, stability condition 2 may seem strange, but the world is seemingly 
full of cases in which the costs of being punished seem insufficient to explain the 
observed degree of cooperation. Hence, this may illuminate such things as why 
Americans pay too much in taxes (i.e., more than they should assuming most 
people pay because they fear punishment; Skinner and Slemrod, 1985), why 
Americans wait in line, why the Ache share meat (Kaplan and Hill, 1985), and 
why people bother going to the voting booth (Mueller, 1989)—all of which 
seem overly cooperative, given the expected penalty. As we show, this may be 
important from a cultural group selection perspective because groups that 
minimize the costs of punishing and being punished (p and tp), while still main¬ 
taining cooperation, will do better than those that rely heavily on punishment to 
maintain cooperation. 


Once Cooperation Is Stabilized, It Can Spread by 
Cultural Croup Selection 

By itself, the present model does not provide an explanation for human coop¬ 
eration. We have shown that, under plausible conditions, a relatively weak 
conformist tendency can stabilize punishment and therefore cooperation. How¬ 
ever, noncooperation and nonpunishment are also an equilibrium of the model, 
and we have given no reason, so far, why most populations should stabilize at the 
cooperative equilibrium rather than the noncooperative equilibrium. However, 
when there are multiple stable cultural equilibria with different average payoffs, 
cultural group selection can lead to the spread of the higher payoff equilibrium. As 
we have demonstrated, cultural evolutionary processes will cause groups to exist 
at different behavioral equilibria. This means that different groups have differ¬ 
ent expected payoffs (due to different degrees of economic production, for ex¬ 
ample). The expected payoff of individuals from cooperative groups is 
b es (1 — e) (B — C — eN[<p + p(l + z)), while the expected payoff of individuals in 
noncooperative/nonpunishing groups is zero. Thus, cooperative groups will have 
a higher average payoff as long as the benefits of cooperation are bigger than the 
costs of cooperation and punishment. The combination of conformism and 
payoff-biased transmission must also be strong enough to maintain stable co¬ 
operation in the face of migration between groups. Such persistent differences 
between groups creates the raw materials required by cultural group selection. 

Cultural group selection can operate in a number of ways to spread proso¬ 
cial behaviors. Cooperative groups will have higher total production and con¬ 
sequently, more resources that can support more rapid population growth 
relative to noncooperative groups. Or cooperative groups may be better able to 
marshal and supply larger armies than noncooperative groups and hence be more 
successful in warfare and conquest. However, although these factors may be 
important (see Bowles, 2000), another, slightly subtler, cultural group selec¬ 
tion process may also be significant. Payoff-biased imitation means people will 
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preferentially copy individuals who get higher payoffs. The higher an individual’s 
payoff, the more likely that individual is to be imitated. If individuals have 
occasion to imitate people in neighboring groups, people from cooperative pop¬ 
ulations will be preferentially imitated by individuals in noncooperative pop¬ 
ulations because the average payoff to individuals from cooperative populations 
is much higher than the average payoff of individuals in noncooperative popu¬ 
lations. Boyd and Richerson (2000) have shown that, under a wide range of 
conditions (and fairly quickly), this form of cultural group selection will deter¬ 
ministically spread group-beneficial behaviors from a single group (at a group- 
beneficial equilibrium) through a meta-population of other groups, which were 
previously stuck at a more individualistic equilibrium. 


Culturally Evolved Cooperation May Cause Genes for 
Prosocial Behavior to Proliferate 

Once the cooperative equilibrium becomes common, it is plausible that natural 
selection acting on genetic variation will favor genes that cause people to co¬ 
operate and punish—because such genes decrease an individual’s chance of 
suffering costly punishment. This could arise in many ways. Individuals might 
develop a preference for cooperative or punishing behaviors that increases their 
likelihood of acquiring such behaviors. Or, alternatively, natural selection might 
increase the reliance on conformist transmission, making people more likely to 
acquire the most frequent behavior. 

Here, we analyze the case in which the probability of mistakenly defecting 
or not punishing, e, varies genetically. We assume that cultural evolution is much 
faster than genetic evolution, which implies that the population exists at a 
culturally evolved cooperative equilibrium. Further assume that while most 
individuals still make errors at the rate e, rare mutant individuals have a slightly 
different error probability of e'{= e— e), where e is small (|e| -C e). If we assume 
that an individual’s average payoff, b, is proportional to her average genetic 
fitness, then we can ask whether prosocial mutants will spread. The expected 
fitnesses for the two types, F and F m (“m” for mutant), and the difference 
between them, A F, are as follows (assuming i > 0): 4 

F « (1 - e)[B - C - eN{4> + p(l - e)(i + 1)), 

F m ES B( 1 - e) - C(1 - e') - N(e0 + e'p( 1 - e)(i + 1)), 

AF = F m — F = e[Np{i + 1) — C) (16) 

When AF is positive, prosocial genes can invade. If C < (1 — e)Np + ( eNy<t> 
(stability condition 1), then C is always less than Np(l — e)(i+ 1), and prosocial 
genes are always favored. Once at fixation, these prosocial genes cannot be in¬ 
vaded by more error-prone, anti-social, individuals. 

In stability condition 2, where C > (1 — e]Np + [eNJcj), prosocial genes are 
favored (for i > 0) when: 

JNeU_ C_ 

Np( 1 - e) Np{ 1 - e) 


<i+l 


(17) 
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which is a wide range, since the smallest possible value of i is 1. However, there 
exists a range of conditions in which culturally evolved cooperation is stable, but 
prosocial genes cannot invade—in fact, anti-social genes (genes favoring more 
mistakes] may invade. This occurs when (for i > 0]: 


(j+21 < 

No 

prosocial 


C (1 •' -/) 
Np( 1 - e) < 

Stability 


(18] 


When condition (18] holds, cultural transmission will stabilize cooperation, but 
prosocial genes will not be able to invade—instead, anti-social genes will be 
favored (i.e., £ is negative]. Note, however, that the minimum value of a for this 
condition to exist requires a >0.333, which occurs when i= 1. Generally, we 
believe a is much smaller than this, but we will await the verdict of future em¬ 
pirical work. Interestingly, this anti-social invasion is likely to occur in the groups 
most favored by cultural group selection—those who maximize group payoff by 
minimizing punishment costs (and t), without destabilizing cooperation. Un¬ 
fortunately, anti-social invasion will decrease average payoffs and may eventually 
destabilize cooperation. Further work on this gene-culture interaction will re¬ 
quire coevolutionary models that combine both cultural and genetic evolu¬ 
tionary processes (perhaps using quantitative traits] and particularly the cultural 
group selection process we have described. 

As we have begun to model it here, prosocial genes are not strongly selected 
against in noncooperative populations because error making, in terms of mis¬ 
taken cooperation and punishment, occurs only when individuals adopt prosocial 
traits—defectors do not mistakenly cooperate. So, if the world is a mix of co¬ 
operative and noncooperative populations, prosocial genes will be favored in a 
wide range of circumstances in cooperative populations and will be compara¬ 
tively neutral in noncooperative populations. It is possible that incorporating 
defector errors, in the form of mistaken cooperation or punishment, may affect 
this prediction. Furthermore, cooperation may not be a dispositional trait of 
individuals, but rather a specific behavior or value tied only to certain cultural 
domains. Some cultural groups, for example, may cooperate in fishing and house¬ 
building but not warfare. Other groups may cooperate in warfare and fishing but 
not house-building. Such culturally transmitted traits would have the form 
“cooperate in fishing,” “cooperate in house-building,” and “do not cooperate 
in warfare,” rather than the more dispositional approach of simply “cooperate” 
versus “do not cooperate.” If this is the case, then the migration and spread 
of prosocial genes becomes more difficult. As prosocial genes spread among 
groups with different stable cooperative domains, individuals with such genes 
would be more likely to mistakenly cooperate in noncooperative cultural do¬ 
mains. For example, in cultures where people cooperate in fishing but not 
warfare, individuals with prosocial genes may be more likely to mistakenly 
cooperate in warfare (and pay the cost], as well as less likely to mistakenly defect 
in cooperative fishing. We intend to pursue those avenues in subsequent work. 
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Conclusion 

We have done three things in this chapter. First, we have shown that, if humans 
possess a psychological bias toward copying the majority, as well as a bias toward 
imitating the successful, then cultural evolutionary processes will stabilize co¬ 
operation and punishment for some finite number of punishment stages. Second, 
we discussed how, once cooperation is stable, a particular form of cultural group 
selection is likely to spread these group-beneficial cultural traits through human 
populations. And finally, we have demonstrated that prosocial genes, which 
cannot otherwise spread, can invade in the wake of these cultural evolutionary 
processes, under a wide range of conditions. 


APPENDIX 


For all i: 


Apt = PiC 1 -Ai)[(l - a)/3(Afij) + a)2pi - 1)] 

Difference in payoff for i = 0: 

Abo = b c - bo = (1 - e)(Npi (1 - e)p - C) 
Difference in payoffs for i > 0: 

A bj = b Pi - b NPi = -(1 - 1 — p, i C 1 - <0) 

i =o 

where 


Thus, 
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i=o 

Eigenvalues for the system of i + 1 equations with punishment up to the i-th 
stage: 


lo = — a + (1 — a)(l — e)fi[C — (1 — ejNp), 

Xj = —a + (1 — a)(l — e)p[[eN)’4> — pN[ 1 — e)),0 < j < i , 

Xi = — a T (1 — a)(l — e)p[eN]'4> 

When the dominant eigenvalue (that with the largest value) is less than zero, the 
system is locally stable at point (po, pi ,..., pi+i) = (1, 1,..., 0). 
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NOTES 

We would like to thank Natalie Smith, Herbert Gintis, and the anonymous re¬ 
viewers for their assistance and suggestions in preparing this chapter. 

1. Two other explanations for cooperation go by the handles by-product mutu¬ 
alism (Brown, 1983] and group selection (Sober and Wilson, 1998], In by-product 
mutualism, individuals who “cooperate” get a higher payoff (have a higher expected 
fitness) than noncooperators. The cooperative contribution to the fitness of others is 
simply a by-product of narrow self-interest. That is, in the process of helping myself, 
I also help you “by accident.” Hence, although this situation may abound in nature, 
it is not the situation we are interested in (and not cooperation by many definitions). 
And, while genetic group selection may explain some cooperation in nature (e.g., 
honeybees; see Seeley, 1995), we believe that gene flow rates between human pop¬ 
ulations, relative to selection, are too high to maintain the required variation between 
groups (Richerson and Boyd, 1998). 

2. Note, under a small range of conditions, when C > N(p(l — e) + etf), the 
system can still remain stable. Under these conditions, however, /? becomes 
1 /C(l — e). For simplicity, we leave this nuance until later in the chapter. 

3. Actually, there is a tiny range of (Np( 1 — e) + <(>(eIV)') < C< (Np(l — e) + 
N<t>e ) under which fl still equals 1/(N(1 — e)(<(>(l — e) + e(f>)f Nothing particularly 
interesting happens in this range, so we will not discuss it. Note, if i= 1, the range 
is nonexistent. 

4. If conformist transmission alone can stabilize cooperation without any 
punishment (i = 0), then A F < 0, and prosocial genes will never spread. 
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11 Can Group-Functional 

Behaviors Evolve by Cultural 
Group Selection? 

An Empirical Test 

With Joseph Soltis 


Many anthropologists explain human behavior and social institu¬ 
tions in terms of group-level functions (Rappaport, 1984; Lenski and Lenski, 
1982; Harris, 1979; Radcliffe-Brown, 1952; Aberle, Cohen, Davis, Levy, and 
Sutton, 1950; Malinowski, 1984 [1922]; Spencer, 1891). According to this view, 
beliefs, behaviors, and institutions exist because they promote the healthy 
functioning of social groups. Such functionalists believe that the existence of an 
observed behavior or institution is explained if it can be shown how the behavior 
or institution contributes to the health or welfare of the social group. Most 
functionalists in anthropology have not explained how group-beneficial beliefs 
and institutions arise or by what processes they are maintained [Turner and 
Maryanski, 1979). When functionalists do provide a mechanism for the gener¬ 
ation or maintenance of group-level adaptations, it is usually in terms of selection 
among social groups. 1 Functionalists believe that societies have many functional 
prerequisites. Social groups whose culturally transmitted values, beliefs, and 
institutions do not provide for these prerequisites become extinct, leaving only 
those societies with functional cultural attributes as survivors. We refer to this 
process as “cultural group selection” because it involves the differential survival 
and proliferation of culturally variable groups. 

Cultural group selection is analogous to genetic group selection but acts on 
cultural rather than genetic differences between groups. This distinction is im¬ 
portant. We will argue that cultural variation is more prone to group selection 
than genetic variation and that this may explain why human societies, in contrast 
to those of other animals, are frequently cooperative on scales far larger than kin 
groups. More generally, recent theoretical work on the processes of cultural 
evolution shows that there are many parallels between cultural and genetic 
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evolution but also some fundamental differences (Durham, 1991; Boyd and 
Richerson, 1985; Cavalli-Sforza and Feldman, 1981; Pulliam and Dunford, 1980). 
To date, empirical investigations focused on these processes are few (but see, e.g., 
Cavalli-Sforza, Feldman, Chen, and Dornbusch, 1982). In addition to conducting 
empirical studies specifically designed to investigate these processes, it is possible 
to use many of the data collected by social scientists for other purposes. Here we 
use a small part of the very rich ethnographic record produced by anthropologists 
to test the empirical plausibility of the process of cultural group selection. 

As emphasized by Campbell (1965, 1975, 1983), cultural group selection 
requires that (1) there be cultural differences among groups, (2) these differ¬ 
ences affect persistence or proliferation of groups, and (3) these differences be 
transmitted through time. If these three conditions hold, then, other things being 
equal, cultural attributes that enhance the persistence or proliferation of social 
groups will tend to spread. There is no guarantee, however, that this process will 
be sufficiently powerful to overcome other social processes that act to produce 
other outcomes. There are two problems with cultural group selection as an 
explanation for the existence of group-beneficial traits: maintenance of variation 
among groups and rate of adaptation. 

Group-functional explanations may be in conflict with the fact that human 
choices are at least partly self-interested. To the extent that they can evaluate 
alternative beliefs and attitudes, self-interested organisms should adopt only 
beneficial attitudes and beliefs and reject those that are individually harmful. 
Thus, beliefs that are costly to the individual should diminish, while beliefs that 
are beneficial to individuals should spread. Extensive theoretical analysis suggests 
that group selection can counteract this process only if groups are very small and 
migration among groups is very limited (Eshel, 1972; Levin and Kilmer, 1974; 
Wade, 1978; Slatkin and Wade, 1978; Boorman and Levitt, 1980; Wilson, 1983; 
Aoki, 1982; Rogers, 1990). As a result, most evolutionary biologists and social 
scientists influenced by them (e.g., Chagnon and Irons, 1979) reject functionalist 
explanations. 

Furthermore, Hallpike (1986) has argued that group extinction does not 
occur often enough to justify functionalist explanations. Group selection works 
by eliminating those societies that have deleterious practices or institutions. If it 
takes a particular number of extinctions to eliminate a deleterious ritual form, 
then it will take a greater number to eliminate the deleterious ritual form and a 
deleterious marriage practice. Still further extinctions will be required to cause 
other aspects of the society to become adaptive. Hallpike argues that human 
societies do not have high enough extinction rates for group selection to cause 
many different attributes to be adaptive at the group level simultaneously. 

In the face of these objections, is there any justification for taking group- 
functional hypotheses seriously? Here we describe a theoretical model and 
present supporting data which show that a role for cultural group selection 
should not be ruled out. Boyd and Richerson (1985, chs. 7 and 8, 1990a, b) have 
analyzed mathematical models of group selection acting on culturally transmitted 
variation and have shown that cultural group selection will work if certain key 
assumptions are met. Ethnographic data from Papua New Guinea and Irian Jaya 
give credence to some of the assumptions that underpin the group-selection 
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model. These data also allow us to estimate an upper bound on the rate of 
adaptation that could result from group selection. We argue that these data 
suggest that group selection is too slow to be used to justify the common practice 
of interpreting as group-beneficial the detailed aspects of particular cultures. 
However, the data do not exclude the possibility that group selection may account 
for the gradual evolution of some group-level adaptations, such as complex 
social institutions, over many millennia. 


How Group Selection Can Work 

We begin with the premise that individuals acquire various skills, beliefs, atti¬ 
tudes, and values from other individuals by social learning and that these "cul¬ 
tural variants,” together with their genotypes and environments, determine their 
behavior. To understand why people behave as they do in a particular envi¬ 
ronment, we must know the skills, beliefs, attitudes, and values that they have 
acquired from others by cultural inheritance. To do this, we must account for 
the processes that affect cultural variation as individuals acquire cultural traits, 
use the acquired information to guide behavior, and act as models for others. 
What processes increase or decrease the proportion of persons in a society who 
hold particular ideas about how to behave? Here we will consider two kinds of 
processes: biased cultural transmission and selection among social groups. 

Biased cultural transmission occurs when individuals preferentially adopt 
some variants relative to others. Individuals may be exposed to a variety of 
beliefs or behaviors, evaluate these alternatives according to their own goals, and 
preferentially imitate those variants that seem best to satisfy their goals. If many 
of the individuals in a population have similar goals, this process will cause the 
cultural variants that best satisfy these goals to spread. For example, if the two 
variants are more and less restrictive forms of food taboos and individuals prefer 
the broader diet that results from the less restrictive variant, then that variant 
will spread. This process, which is important in the spread of innovations (Rogers, 
1983), often tends to cause groups living in similar environments to have similar 
behaviors. 

However, biased cultural transmission can also maintain differences between 
groups of people living in similar environments. This can occur in two ways: first, 
a belief or behavior may be more attractive if it is more widely used than the 
alternatives. Many social behaviors have this character. For example, if food 
taboos are used as ethnic markers, then in a group in which the more restrictive 
taboo predominates, individuals may choose that taboo over the less restrictive 
one because the social benefits compensate for the nutritional costs. Game theory 
suggests that many kinds of social interactions, including bargaining, contests, and 
punishment-enforced norms, will generate an astronomical number of alternative 
equilibria. Second, when individuals are unable to evaluate the merits of alter¬ 
native variants, they may instead use a simple rule of thumb such as adopting the 
most common variant. This conformist form of biased transmission causes the 
more common variant to increase. For example, if the majority of a group ob¬ 
serves the more restrictive taboo, it will tend to increase. 
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When either common-type-advantage or conformity maintains differences 
among groups, group selection can be an important force. Consider a large pop¬ 
ulation sub-divided into many smaller, partially isolated groups. Suppose that 
biased cultural transmission maintains cultural differences among these groups 
despite frequent contact and occasional intermarriage and that these cultural 
differences affect the welfare of the group. For example, groups in which re¬ 
strictive food taboos are common may tend to harvest game at approximately 
the maximum sustainable yield, while groups in which less restrictive taboos are 
common overexploit their game resources and suffer significantly poorer nutri¬ 
tional status as a result. Further suppose that social groups are occasionally 
disrupted and their members dispersed to other local groups and that the rate 
at which this occurs depends on the overall welfare of the group. Such disrup¬ 
tion and dispersal may be the result of population decline, social discord, or the 
actions of aggressive neighbors. Poor nutritional status will contribute to these 
risks. Thus, according to our hypothetical example, groups with less restric¬ 
tive food taboos will, on average, be more likely to be broken up and dispersed. 
Finally, suppose that as some groups decline and disappear, other groups grow 
and eventually divide, forming new groups, and that the rate at which this occurs 
increases with the overall welfare of the group. Thus, the growing, dividing 
groups will tend to have more restrictive food taboos than declining ones, and 
restrictive food taboos will tend to spread as a result of selection among groups. 
Others have proposed at least implicitly similar models (e.g., Peoples, 1982; 
Divale and Harris, 1976; Irons, 1975). 

This model of group selection differs from those analyzed in population 
biology in that biased transmission maintains variation among groups. Biologists 
have been concerned with whether group selection could allow the evolution of 
altruistic behavior. In these models, natural selection acts against altruistic be¬ 
havior in every group, and this selection process tends to reduce variation among 
groups. The only process creating variation among groups is genetic drift, a very 
weak force. Thus, group selection can have little effect because groups are ge¬ 
netically very similar. In the model outlined here, it is assumed that various 
forms of biased transmission, potentially very strong individual-level forces, act 
to maintain differences among groups and group selection can predominate. 

The form of group selection just outlined can be a potent force even if 
groups are usually very large. For a favorable cultural variant to spread, it must 
become common in an initial subpopulation. The rate at which this will occur 
through random driftlike processes (Cavalli-Sforza and Feldman, 1981) will be 
slow for sizable groups (Lande, 1986). However, this need occur only once. 
Thus, even if groups are usually large, occasional population bottlenecks may 
allow group selection to get started. Similarly, environmental variation in even a 
few subpopulations may provide the initial impetus for group selection. Some 
environments may lead groups to adopt group-beneficial traits because they are 
also individually advantageous. These practices may then spread by group selec¬ 
tion into environments where they have only a group advantage. For example, 
restrictive food taboos may arise in a very heterogeneous environment in which 
it is important for individuals to specialize in narrow-range food-procurement 
strategies and only later spread by group selection to less heterogeneous 
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environments where they mainly function to protect resources against the 
tragedy of the commons. 

Unlike many genetic models, this form of group selection does not require 
that the people who make up groups die during group extinction. All that is 
required is the disruption of the group as a social unit and the dispersal of its 
members throughout the metapopulation. Such dispersal has the effect of cul¬ 
tural extinction, because dispersing individuals have little effect on the frequency 
of alternative behaviors in the future; in any one host subpopulation, they will be 
too few to tip it from one equilibrium maintained by convention or conformity 
to another. 

Cultural group selection is very sensitive to the way in which new groups are 
formed. If new groups are mainly formed by individuals from a single preexisting 
group, then the behavior with the lower rate of extinction or higher level of 
contribution to the pool of colonists can spread even when it is rare in the 
metapopulation. If, instead, new groups result from the association of individuals 
from many other groups, group selection cannot act to increase the frequency of 
rare strategies. 


Empirical Evidence 

To justify using this model of cultural group selection we need data that allow us 
to answer three questions: 

1. Do groups suffer disruption and dispersal at a rate high enough to 
account for the evolution of any important attributes of human 
societies? 

2. Are new groups formed mainly by fission in groups that avoid ex¬ 
tinction? 

3. Are there transmissible cultural differences among groups that af¬ 
fect their growth and survival, and do these differences persist long 
enough for group selection to operate? 

To address these questions we present data on group extinction rates, group 
formation, and cultural variability drawn from the ethnographic literature of 
Irian Jaya and Papua New Guinea. We have chosen this area because it offers 
high-quality ethnographic descriptions of peoples that had not been pacified by 
a colonial administration. Colonialism is suspected by some to increase rates of 
intergroup conflict in stateless societies, casting doubt on data from areas like the 
American Plains, where contact predated good ethnography. New Guinea is 
unique in the amount of good ethnography obtained within a few years of first 
contact with complex societies. We have focused on pre-state societies because 
they are characteristic of more of human history than more complex societies, 
and the basic institutions of human societies evolved under stateless conditions. 

We have made an effort to sample as many ethnographies as possible, fo¬ 
cusing on those dealing with pre-contact warfare among indigenous peoples. We 
have chosen to focus on warfare only because it is a conspicuous way in which 
groups become extinct and is likely to be recorded. Even where defeat in war is 
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the proximate cause of an extinction, a variety of other factors may have pre¬ 
cipitated the event by causing the defeated group to decline in numbers. Ex¬ 
tinction through war may be the common fate of groups that have declined for 
some other reason. 

We define a group as a territorial population that can conduct warfare as a 
unit. An extinction is said to occur when (1) all members of a group are killed or 
(2) members of a group are assimilated into another group either wholly or in 
part. When a group is routed from its territory but remains intact as a social unit 
(or its fate is unknown), then a forced migration, not an extinction, is said to 
have occurred. 

Group Extinction 

To estimate the rate of group extinction for a region, three types of information 
are needed: (1) the number of extinctions, (2) the number of years over which 
the extinctions took place, and (3) the number of groups among which the 
extinctions took place. We were able to assemble this information for five re¬ 
gions in Irian Jaya and Papua New Guinea. 


The Mae Enga 

The Mae Enga live in the Central Western Highlands, where population density 
averages 40 to 43 persons/km 2 but reaches densities of over 100 persons/km 2 
(Meggitt, 1962:158, 1977:1). The immediate causes of war (Meggitt, 1977:13) 
are land disputes (58 percent), other property disputes (24 percent), homicide 
(15 percent), and problems related to sexual jealousy (3 percent). Meggitt re¬ 
corded a 50-year warfare history for 14 Mae Enga clans. In the 29 conflicts for 
which the outcome was known, there were five extinctions. Extinctions did not 
result from the killing of all group members; routed clan members were forced 
to disperse and find refuge among other clans, often with kin (1977:15, 25-27). 
There is evidence that these immigrants became culturally assimilated into their 
host group, usually within a generation (Meggitt, 1965:31-35). Rapid assimila¬ 
tion occurred because true clan members received unqualified land rights, as well 
as economic, ritual, and military aid. As Meggitt (1977:190) notes, “Members of 
defeated and dispersed groups who have gone to live elsewhere have good po¬ 
litical and economic reasons not to draw attention to their immigrant status but 
instead try for relatively rapid absorption into the host clan.... In consequence, 
the identities of extinguished clans or subclans are soon lost to public knowledge 
and in time such groups drop out of the genealogies of their former phratries.” 


The Maring 

The Maring live in the Central Highlands, an area of relatively low population 
densities, averaging less than 20 persons/km 2 (Vayda, 1971:22). Wars are usually 
triggered by a murder or attempted murder (56 percent of cases). The remaining 
44 percent are fought over land, women, or theft (1971:4). Vayda’s warfare 
history concerns 32 clan-clusters and autonomous clans and has a depth of about 
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50 years (Andrew Vayda, personal communication). He mentions 14 wars in 
which victims were routed from their territories. Only in one case was there a 
clear extinction; the other groups eventually returned. However, in two of these 
cases routed clans reclaimed their territory only with the help of the Australian 
police and probably would have become extinct otherwise. Rappaport (1967:26) 
explains that members of vanquished groups who find refuge in another group 
do not maintain their autonomy: "the de facto membership of the living in 
groups with which they have taken refuge is converted eventually into de jure 
membership. Sooner or later the groups with which they have taken up resi¬ 
dence will have occasion to plant rumbin, thus ritually validating their connec¬ 
tion to the new territory and their new group.” 


The Mendi 

The Mendi live in the Southern Highlands, where population density is 18 per¬ 
sons/km 2 (Meggitt, 1965:272). Ryan (1959) describes, for a 50-year period, the 
history of clan degeneration, extinction, and new group formation for a group 
of nine clans known as the Mobera-Kunjop. In this period there were three clan 
extinctions. In two cases, the clans were routed by warfare and absorbed by other 
groups; in the third a degenerating clan was eventually absorbed by another clan. 

In two cases, vanquished groups did not suffer disruption but managed to 
remain functioning as an intact subclan in their host group. Ryan (1959:271) 
suggests that such accretionary subclans eventually become assimilated into their 
host clan: "The refugee group, consisting of sub-clan brothers and their families, 
may be large enough to assume the immediate status of a subclan.... Once the 
people have been accepted, granted land, and have settled down, there is almost 
no further differentiation made between them and the original subclans.” How¬ 
ever, individual nonagnates suffer discrimination from members of their host 
clans (Ryan, 1959). They are less likely to receive bridewealth support (which 
normally comes from fellow subclan members) than are true group members, and 
therefore refugees have reason to want to assimilate into their host group: "Al¬ 
though it is asserted that acceptance is complete ... marriage figures indicate that 
non-agnatic men tend to marry later than agnatic clan members, more of them 
marry only once, and more of them have only one wife at a time” (p. 269). 


The Fore and Usufura 

Berndt (1962) recorded detailed descriptions of war involving groups in four 
adjacent linguistic regions of the Eastern Highlands—the Fore, the Usufura, the 
Jate, and the Kamano. Fore population density is approximately 15 persons/km 2 
and that of the Usufura 27 persons/km 2 (Berndt, 1962:20). No values are given 
for the other linguistic groups. Berndt recorded one extinction during the 10- 
year period preceding his research. The group was routed in warfare and dis¬ 
persed into several different districts in the area. The number of groups involved 
is slightly ambiguous; Berndt indicates that his warfare data are most complete 
for only 8 districts in the area but mentions some 24 districts in his accounts of 
warfare. 
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The Tor 

The Tor region is located on the northern coast of Irian Jaya (Oosterwal, 1961). 
No density figures are provided. Oosterwal recorded a 40-year history for the 26 
tribal territories in the Tor region. Four tribes suffered extinction either through 
peaceful absorption, military defeat and dispersal, or outright extermination 
(Oosterwal 1961:21-26). In one of the extinctions, Oosterwal is clear about the 
cultural assimilation of the extinct group: “Formerly the Mander language was 
only spoken by the Mander, but since the Foja have lived together with the 
Mander, they have adopted the Mander language entirely. Save for a small 
number of words, these Foja do not recollect any more of their own language. 
Their kinship terminology is also identical with that of the Mander” (p. 23). 

Table 11.1 summarizes extinction rates for the five regions for which there 
were enough data to compute such estimates. We assume that the number of 
groups remains constant, which means that each extinction is followed by an 
immediate recolonization. To the extent that this assumption is wrong, ex¬ 
tinction rates will be higher. We found no ethnographies that yielded an ex¬ 
tinction rate of zero. In our sample, the percentage of groups suffering extinction 
each generation ranges from 1.6 percent to 31.3 percent. 

It seems likely that other areas in New Guinea had similar group extinction 
rates. There is mention of group extinction in 54 percent (15/28) of the societies 
sampled. This is no doubt an underestimate, because the failure to mention an 
extinction in an ethnographic account of warfare does not necessarily mean that 
extinctions never occurred. In 89 percent (25/28) of the societies sampled, there 
is mention of either group extinction or forced migration (see table 11.2). The 
near ubiquity of extinction and forced migration in the ethnographic record 
suggests that high rates of extinction were common throughout Papua New 
Guinea and Irian Jaya before pacification. 

New Group Formation 

Group selection is most effective when new groups are made up of members of 
a single existing group rather than of members of many different groups. If new 
groups are formed when a single group generates a daughter group from among 


Table n.i. Summary of group extinction rates for five regions of Papua 
New Guinea and Irian jaya 


Region 

Groups 

Extinctions 

Years 

Percentage of 
groups extinct 
every 25 years 

Source 

Mae Enga 

14 

5 

50 

17.9 

Meggitt (1977) 

Maring 

32 

1-3 

50 

1.6-4.7 

Vayda (1971) 

Mendi 

9 

3 

50 

16.7 

Ryan (1959) 

Fore/Usufura 

8-24 

1 

10 

31.3-10.4 

Berndt (1962) 

Tor 

26 

4 

40 

9.6 

Oosterwal (1961) 
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Table n. 2 . Mentions of group extinction and forced migration in Papua 
New Guinea and Irian Jaya 


People 

Extinction 

Migration 

Source 

Mae Enga 

+ 

— 

Meggitt (1977:14) 

Huli 

- 

- 

Glasse (1959) 

Melpa 

+ 

+ 

Strathern (1971:55-56, 67) 

Raiapu Enga 

+ 

+ 

Waddel (1972:37, 186, 263-65) 

Wola 

+ 

+ 

Sillitoe (1977:79) 

Maring 

+ 

+ 

Vayda (1971:11-13) 

Ok 

+ 

+ 

Morren (1986:266-67, 272-73, 278-79) 

Kuma 

+ 

+ 

Reay (1959:7, 27, 32) 

Chimbu 

- 

+ 

Brown and Brookfield (1959:41, 61, 263-65) 

Usufura 

- 

+ 

Berndt (1962:242) 

Jate 

+ 

+ 

Berndt (1962:253, 260-61) 

Fore 

- 

+ 

Berndt (1962:236, 251, 257) 

Auyana 

+ 

+ 

Robbins (1982:213-14) 

Kukukuku 

- 

+ 

Blackwood (1978:102) 

Gahuku 

- 

+ 

Read (1955:253-54) 

Arapesh 

+ 

+ 

Tuzin (1976:63) 

Abelam 

- 

+ 

Lea (1965:196, 205) 

Mailu 

- 

+ 

Saville (1926) 

Kiwai 

+ 

+ 

Landtman (1970[ 1927]: 148-49, 204) 

Dugum Dani 

+ 

+ 

Heider (1970:119-22) 

Ilaga Dani 

- 

+ 

Sillitoe (1977:77) 

Bokondini-Dani 

- 

+ 

Sillitoe (1977:76) 

Jale 

- 

+ 

Koch (1974:79) 

Kapauku 

- 

- 

Pospisil (1963) 

Tor 

+ 

+ 

Oosterwal (1961:21-26, 48) 

Jaqai 

- 

- 

Boelaars (1981) 

Marind-Anim 

+ 

- 

Ernst (1979:36) 

Bena Bena 

+ 

- 

Langness (1964:174) 


its own members, then the daughter group will preserve the cultural variants 
common in the mother group. Cultural variants that facilitate daughter-group 
formation will become more common in the region as a whole. 

Societies in Irian Jaya and Papua New Guinea are characterized by a seg¬ 
mentary social system (Langness, 1964). When members of a social group be¬ 
come too numerous, the group may split into two similar groups. Conversely, 
when members of a social group become too few, they may be absorbed by 
another group at a lower segmentary level (Brown, 1978:184-185, 187-188). 
There are numerous anecdotal accounts of new group formation (e.g., Brown 
and Brookfield, 1959:57; Sillitoe, 1977:79; Vayda, 1971:17; Morren, 1986:269- 
270), but Meggitt (1962, 1965) and Ryan (1959) provide the most detailed 
descriptions of new group formation in two highland societies. 

The Enga have a nested hierarchy of patrilineal descent groups. The phratry 
is the most inclusive, followed by the clan, the subclan, the patrilineage, and the 
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family. Groups everywhere in the hierarchy may grow or decline over time, 
generate daughter groups, or become absorbed by other groups: "Groups may 
emerge, increase in size and take over different functions, and in doing so achieve 
higher status by becoming co-ordinate with groups that previously included 
them. In absorption, groups that are decreasing in numbers have to relinquish 
particular functions and descend to a lower level in the hierarchy.... If the de¬ 
cline continues, the groups eventually vanish” (Meggitt, 1965:79). For a group 
to achieve or retain a particular position in the hierarchy, it must contain enough 
members to perform the functions appropriate to that position. For example, 
from 1900 onward, the population of one Enga clan began increasing noticeably 
until one of its two subclans could no longer support itself on its share of land 
and began encroaching on a neighboring clan’s territory (Meggitt, 1965:62-63). 
In skirmishes with the neighboring clan, the subclan functioned as if it were a 
sovereign clan, lighting and negotiating homicide payments independently of the 
second subclan, which was itself trying to expand in another direction. Even¬ 
tually members of the two subclans settled at opposite ends of the clan territory 
and behaved as members of separate clans by intermarrying. 

Meggitt (1965:78-79) gives an account of two Laiapu Enga phratries dem¬ 
onstrating extinction and new group formation. Each phratry was initially made 
up of four territorial clans. One expanding clan of phratry A attacked and killed 
many members of two clans of phratry B. The survivors of the two clans fled to 
other clans, and the victorious clan occupied the abandoned territory. This suc¬ 
cessful clan was becoming so large as to achieve subphratry status (Meggitt, 
1965:79). Ryan (1959) gives similar accounts of group extinction and new group 
formation in the Mendi Valley. When clans become too populous, they expand 
into new territory and an off-shoot subclan occupies it. The breakaway subclan 
attains clan status as it takes on more and more functions appropriate to a clan. 

Cultural Variation among Groups 

Group extinction and group fission will lead to cultural change only if there are 
transmissible cultural differences that affect the extinction rate or the prolifer¬ 
ation rate. Unfortunately, there is little evidence about the amount of cultural 
variation among local groups because so few ethnographers study more than one 
local group. Furthermore, there is even less evidence about how differences 
between local groups are related to individual and group fitness in New Guinea 
ethnography, although there is quite good evidence from other areas that such 
variation exists (e.g., Kelly’s [1985] study of the causes of Nuer expansion at 
the expense of the Dinka). Nor is there evidence about how long such differ¬ 
ences can persist in New Guinea groups. Archaeological and linguistic data 
from small-scale societies elsewhere document many examples of group ex¬ 
pansion by cultures with more effective social organization in which the dif¬ 
ferences persisted for many generations during the expansionary phase (e.g., 
Bettinger and Baumhoff’s [1982] study of the Numic expansion from south¬ 
eastern California across the Great Basin). 

Ffere we review three detailed studies of cultural variation among local 
groups in New Guinea. Two of these studies focus on the Mountain Ok of Papua 
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New Guinea, while the third covers the lowland Tor region of northern Irian 
Jaya. Each of these studies suggests that there is substantial cultural variation 
among local groups. 


The Mountain Ok 

The Mountain Ok occupy the center of New Guinea and are made up of nine 
“tribes” based on ethnolinguistic affinities (Morren, 1986:180-181). Within 
these tribes are endogamous “communities,” sometimes composed of several 
exogamous clans. Only 15 percent of marriages take place between members of 
different communities (Barth, 1971:176). 

Ritual practice and belief vary considerably from community to community. 
Ritual knowledge, surrounded by secrecy, is fully shared by only a few elders in 
each community. It is transmitted at male initiations, where it is rationed out to 
initiates in steps. Barth argues that the ritual knowledge of different communi¬ 
ties diverges because of error and innovation on the part of the few persons who 
control it. This produces intergroup variation in such things as the interpretation 
of important ritual symbols, the use of myths in ritual contexts, theories of 
conception, and the emphasis on symbolic constructions of human sexuality in 
ritual (Barth, 1987). 

Sacred objects used in the initiation ritual take on different symbolic 
meaning in different communities (Barth, 1987:4-5). For example, fat from a 
wild male boar is emphatically “male” among both the Bimin-Kuskusmin and 
Baktaman of the Faiwolmin tribe. The pig’s fat is mixed with various substances 
to form a red paint that is applied to the bodies of novices, except for their 
“female” parts. In communities of the Telefolmin tribe, however, the red paint 
signifies female menstrual blood. In fact, menstrual blood is sometimes added to 
the concoction, a practice which would be “completely destructive” to the 
integrity of the Faiwolmin rituals. 

Modes in which cosmological ideas are communicated also differ among 
Ok communities. The Baktaman know almost no myths at all. A peripheral Ok 
community, the Mianmin, has a larger corpus of myths, but these are not central 
to their ritual events. The Bimin-Kuskusmin, in contrast, have an abundance of 
myths that are integrated into ritual (Barth, 1987:5-6). 

Theories of conception differ among communities (Barth, 1987:13-15). 
Members of the Baktaman and neighboring communities believe that children 
spring from male semen that is nourished in the mother. Telefolmin males 
believe that children are created from a fusion of male and female substances; 
females believe that a fusion of male and female substances creates only the flesh 
and blood of a child, while the female’s menstrual blood alone forms the bones. 
Other communities are characterized by still different theories of conception. 


The Faiwolmin 

Variation among communities within the Faiwolmin tribal area of the Ok region 
may provide an example of cultural variation that is linked to group survival. 
Barth (1971, cf. Morren 1986) argues that more elaborate, communal rituals and 
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specialized cult houses lead to more centralized community organization, which 
increases the survivability of the communities embracing them, and that com¬ 
munities with less elaborate cultural forms and more dispersed settlement pat¬ 
terns are more likely to become extinct. Within the Faiwolmin tribal area, ritual 
organization and specialization find their most elaborate expression in the cen¬ 
tralized communities (Barth, 1971:179-181], Male initiation is organized in 
seven grades through which males pass as age-sets. In western communities there 
are four such grades, and in the southeastern communities they range from four 
to one (p. 185). Different rituals take place in specialized cult houses. Most 
Faiwolmin communities contain three permanent cult houses as well as a com¬ 
munal men’s house. As one moves east and southward from central Faiwolmin, 
the number of cult houses declines. Most of the southeastern communities 
contain only one permanent cult house, and some perform initiations in tem¬ 
porary structures. 

There is also variation in social organization among Faiwolmin communities, 
following a similar west-to-east pattern of decreasing centralization (Barth, 
1971:184-186]. The centralized communities of the Faiwolmin form compact 
villages around several types of semipermanent cult houses, and several exoga- 
mous clans make up an isolated, largely endogamous political unit. In the east 
the population is dispersed within the community territory, shifting household 
locations at intervals because of soil depletion or fear of sorcery. 

According to Barth, “The dispersed pattern without the cult houses ... clearly 
organizes a smaller population for defense, and their history of displacement 
would seem to demonstrate this disadvantage” (p. 189]; “the greater centraliza¬ 
tion clearly also offers military advantages and has resulted in conquest and terri¬ 
torial expansion of the more highly centralized groups in a general south-eastward 
direction” (p. 186], He argues that the elaborate rituals and the concomitant 
communal centralization were first introduced to the Faiwolmin communities 
from the northwest, and the diffusion of these cultural forms created cultural 
variation among them. Finally, selection among groups increased the frequency of 
those cultural forms conferring the highest fitness on groups (p. 188]: 

The distribution of [cultural] forms is thus generated by a number of 
simultaneously partly independent processes. A process of diffusion 
from an innovation centre ... seems to be taking place. Simultaneously, 
the organization of local cultural transmission is such that both loss and 
improvisation occur and new local variants emerge. Different ritual 
forms imply different community types; these again confront each other 
in warfare and compete and replace each other on the basis of their 
unequal defensive and offensive capacities. 

If Barth is correct, this is an example of group selection increasing the 
cultural variants that enhance group survival. He considers the alternative 
hypothesis that ecological processes explain the smaller scale of social organi¬ 
zation. Although he cannot completely rule out an ecological explanation, he 
clearly suggests that a ritual system that organizes more people and thus leads to 
a greater frequency of victory in violent conflicts is leading to the spread of more 
complex ritual (pp. 188-189], 
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The Tor 

Significant cultural variation also existed between tribal territories of the Tor 
region (Oosterwal, 1961). The Tor region is divided into 26 tribal territories, but 
it has 8 separate languages (Oosterwal, 1961: appendix). Thus, many adjacent 
tribes speak different languages, although the most common language, that of the 
Berrick, is known by members of all tribes (Oosterwal, 1961:18). Oosterwal also 
notes these differences: “the three culture areas in the Tor district are very dis¬ 
tinct. ... [There are] differences in ... kinship terminology, the kinship structure, 
the socio-religious aspect of culture, the way of counting, language-(dialect)- 
differentiations, and some aspects of material culture” (p. 46). These three 
“cultural areas,” with associated kinship terminologies, are the Berrick, the Ittik 
and Mander, and the Segar and Naidjbeedj. Tribes in “transitional zones” have 
elements of all three cultural areas, and there is variation within each area (pp. 
149-174). The terminology of the Berrick tribe emphasizes the age criterion (e.g., 
MoElSi is terminologically distinguished from MoYoSi) but often ignores the 
generational criterion (e.g., MoBr and SiSo call each other by the same term). The 
terminology of the second cultural area ignores the generational criterion to a far 
greater extent. In contrast to those of the previous two areas, cultures in the third 
region have a strong generational aspect in their terminology. There is also vari¬ 
ation within each of these three broad areas. For example, the cousin terminology 
of the Berrick is of the Hawaiian type (all cross and parallel cousins called by the 
same terms as those for sisters), while the Waf and Goeammer (of the same 
culture area) use the Iroquois type (FaSiDa and MoBrDa called by the same terms 
but terminologically differentiated from parallel cousins and from sisters, parallel 
cousins commonly but not always classified with sisters). 

Although it is difficult to show that the particular group extinctions that we 
have counted for the five regions are due to persistent cultural differences, there 
is abundant evidence in New Guinea and elsewhere that cultural differences do 
lead to the success of some groups and the decline of others. For example, among 
the Fore the practice of mortuary cannibalism caused the spread of the deadly 
disease kuru. According to Durham’s (1991:411-413) account of this episode, 
ritual cannibalism was originally adopted by Fore women as a response to a short¬ 
age of game. Nevertheless, the spread of the disease as a by-product of this ritual 
innovation threatened Fore groups with extinction until modern medical teams 
intervened. This case points up the ambiguous role of rational choice in the group- 
selection process. Individual calculation of advantage may often run counter to 
group advantage, especially when acts of cooperation are involved. Rappaport 
(1979:100) called attention to the role of the sacred in concealing group- 
advantageous traits from ready attack by selfish reason. As the Fore experience 
with kuru illustrates, traits disadvantageous to groups (and to individuals in this 
case) may sometimes be concealed in the same way. 

Knauft (1985) gives an example of an apparent group extinction in progress. 
The completely acephalous Gebusi were a small and declining group at the time 
of his study. The better-organized Bedamini, making use of the big-man style of 
political organization, were able to raid Gebusi villages, but the Gebusi were 
unable to organize an effective defense or a retaliatory response. The boundary 
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Gebusi villages most exposed to Bedamini raids were in the process of assimi¬ 
lating to Bedamini customs. 

Knauft (1993) also provides examples of cultural differences among seven 
culture areas along New Guinea’s south coast. He describes how the Marind- 
Anim system of mythico-religious affiliation supports intragroup peace and the 
organization of large-scale head-hunting raids against distant enemies. By con¬ 
trast, the Purari head-hunt among themselves and are declining relative to their 
neighbors. The existence of considerable variation at the scale of language groups 
suggests a considerable time depth for these differences. Although this variation 
occurs among larger groups than we are concerned with here, it does show that 
variation in sociopolitical organization encoded in myth and religion has a strong 
effect on group success. 

It is also important that cultural differences between groups persist on time 
scales sufficient for the operation of group selection. Although there is variation 
among local groups in New Guinea, there are no data bearing on the question of 
how long that variation persists. However, there is ample evidence for the long¬ 
term persistence of cultural differences among larger groups in other culture 
areas. For example, concepts such as mana and tabu typify political culture 
throughout Polynesia despite the fact that these societies have been isolated 
from each other for more than 1,000 years (Kirch, 1984). Egerton (1971) doc¬ 
uments the existence of important differences among four tribal groups living in 
two different types of environment, inlcuding two tribes belonging to the Bantu 
and two to the Kalenjin language groups, which have been separated for thou¬ 
sands of years. He notes that tribal history is more important than contemporary 
environmental circumstances in explaining most of the variation in attitudes and 
values measured in his data. The roots of the 38 languages of Western American 
Indians go back 6,500 years, and cultural differences among close neighbors with 
different cultural history have persisted for long periods (Jorgensen, 1980:109). 
Belgium is divided by a stable linguistic boundary, with a Flemish North and 
a Walloon South (van den Berghe, 1981); despite the fact that there is no to¬ 
pographical separation, the linguistic frontier has persisted for 2,000 years. Such 
examples from archaeology and history can be multiplied at will. While they do 
not prove that cultural differences can persist at smaller scales as required by the 
model, they indicate that this assumption is plausible. 

Discussion 

Cultural group selection can explain the evolution of group-functional behaviors 
and institutions in human societies if two conditions are met: first, there must be 
some mechanism that preserves between-group variation so that group selection 
can operate. The model described provides one such mechanism, and we have 
here tested several of the model’s basic assumptions against the ethnographic 
record to determine if those assumptions are empirically realistic. Second, group 
selection must be sufficiently rapid to explain observed patterns of cultural 
change. The data from Papua New Guinea and Irian Jaya allow us to estimate the 
maximum rate of adaptation through group selection. Thus, we can estimate a 
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minimum time period in which the group-selection process can give rise to group- 
level adaptations. Cultural changes that have occurred on a longer time scale are 
possibly the result of group selection, cultural changes that have occurred on a 
shorter time scale are unlikely to have resulted directly from group selection, but 
they may be its indirect result. For example, cultural group selection may lead to 
the evolution of property rights, which lead to efficient allocations of resources, or 
of political institutions that lead to group-beneficial decisions. 


Model Assumptions 

The data from New Guinea provide some qualified support for the model of 
group selection described. 

1. Group disruption and dispersal are common. Extinction rates per 
generation range from 2 percent to 31 percent, with a median of 
10.4 percent in the five areas for which quantitative data are 
available, and the frequent mention of extinction elsewhere suggests 
that these rates are representative. 

2. New groups are usually formed by fission of existing groups. The 
detailed picture from the Mae Enga and the Mendi is supported 
by anecdotal evidence from other ethnographies. We are not aware 
of any ethnographic report from New Guinea in which colonists of 
new land are drawn from multiple groups. 

3. There is variation among local groups, but it is unknown whether 
this variation persists long enough to be subject to group selection 
and whether this variation is responsible for the differential ex¬ 
tinction or proliferation of groups. 


Rates of Change 

The New Guinea data on extinction rates allow us to estimate the maximum rate 
of cultural change that can result from cultural group selection. For a given group 
extinction rate, the rate of cultural change depends on the fraction of group 
extinctions that are the result of heritable cultural differences among groups. If 
most extinctions are due to nonheritable environmental differences (e.g., some 
groups have poor land) or bad luck (e.g., some groups are decimated by natural 
disasters), then group selection will lead to relatively slow change. If most ex¬ 
tinctions are due to heritable differences (e.g., some groups have a more effective 
system of resolving internal disputes), then group selection can cause rapid 
cultural change. The rate of cultural change will also depend on the number of 
different, independent cultural characteristics affecting group extinction rates. 
The more different attributes, the more slowly will any single attribute respond 
to selection among groups. By assuming that all extinctions result from a single 
heritable cultural difference (or tightly linked complex of differences) between 
groups, we can calculate the maximum rate of cultural change. 

Such an estimate suggests that group selection is unlikely to lead to signifi¬ 
cant cultural change in less than 500 to 1,000 years. The length of time it takes 
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Table 11.3. Minimum number of generations necessary to change the 
fraction of groups in which a favorable trait is common assuming a particular 
extinction rate 


Extinction rate 

Initial fraction Final fraction - 


favorable trait 

favorable trait 

1.6% 

10.4% 

17.9% 

31% 

0.1 

0.9 

192 

40.0 

22.3 

11.8 

0.01 

0.99 

570 

83.7 

46.6 

24.8 


Note: Extinction rates were chosen as follows: 1.6 percent (for the Maring) is the lowest 
estimate, 10.4 percent is the median extinction rate, 17.9 percent (for the Mae Enga) is the 
estimate based on the best data, and 31 percent (for the Fore/Usufura) is the highest estimate. 


a rare cultural attribute to replace a common cultural attribute is one useful 
measure of the rate of cultural change. Suppose that initially a favorable trait 
is common in a fraction q 0 of the groups in a region. Then the number of gen¬ 
erations (f) necessary for it to become common in a fraction q t of the groups can 
be estimated (see Appendix). The time necessary for different parameters is 
given in table 11.3. If we take the median extinction rate as representative, these 
results suggest that group selection could cause the replacement of one cultural 
variant by a second, more favorable variant in about 40 generations, or roughly 
1,000 years. If we take the extinction rate calculated using the best data, those 
from the Mae Enga, this time is cut roughly in half. These calculations assume 
that colonizing groups are selected at random from the population. If group 
proliferation is as selective as group extinction, then the time is again cut in half, 
reducing the substitution time (based on the median extinction rate), once again, 
from 1,000 to 500 years. Not all extinctions and new group formations result 
from heritable cultural differences. Since the New Guinea ethnographic data are 
not sufficient to estimate the extent to which cultural variation influences group 
extinctions, it is not possible to make an estimate of the actual strength of group 
selection in New Guinea. If such estimates were possible, we expect that they 
would show that actual rates are considerably below the maximum. The max¬ 
imum rate is nevertheless useful as an upper bound on the kinds of evolutionary 
events that cultural group selection might explain. 

Our estimate of the maximum rate of adaptation suggests that group se¬ 
lection is too slow to account for the many cases of cultural change that occur in 
less than 500 to 1,000 years. For example, according to Feil (1987) the arrival of 
the sweet potato in the highlands of New Guinea sometime in the eighteenth 
century led to many important cultural changes. The introduction of the horse 
to the Great Plains of North America in the 1500s led to the evolution of the 
culture complex of the Plains Indians in less than 300 years. If the rates of group 
extinction estimated for New Guinea are representative of small-scale societies, 
cultural changes such as these cannot be explained in group-functional terms. 
There has not been enough time for group selection to have driven a single 
cultural attribute to fixation, even if that attribute had a strong effect on group 
survival. Processes based on individual decisions are likely to account for such 
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episodes of rapid evolution (see Smith and Winterhalder, 1992; Boyd and 
Richerson, 1985). Such processes will not lead to group-functional outcomes 
except in certain special circumstances (see n. 1). It is possible that situations in 
which a trait or trait complex that increases the scale of cooperation is spreading 
such as the one Barth posits for the Faiwolmin do show rapid cultural group 
selection in progress. If the arrival of the sweet potato a few centuries ago did 
provide the subsistence basis for larger and more complex societies, we might 
expect to observe group selection in the early to middle stages of the spread of 
newly advantageous forms of social organization (Golson and Gardner, 1990; 
Feil, 1987). 

These results also suggest that group selection cannot justify the practice 
of interpreting many different aspects of a culture as group-beneficial. A given 
extinction rate will lead to slower change if many different, unrelated aspects of 
the culture affect group survival. Suppose that both beliefs about food con¬ 
sumption and beliefs about spatial organization affect group survival. Then, 
unless each extinction occurs in a group in which both deleterious beliefs about 
food consumption and deleterious beliefs about spatial organization are com¬ 
mon, some extinctions have no effect on the fraction of groups with deleterious 
beliefs about food, and some extinctions have no effect on the fraction of groups 
with deleterious beliefs about spatial organization. Thus, a given number of 
extinctions must lead to slower evolution of each character than would be the 
case if only one of the characters affected group survival. If group selection can 
cause the substitution of a single trait in 500 to 1,000 years, the rate for many 
traits will be substantially longer. We know from linguistic and archaeological 
evidence that related cultural groups that differ in many cultural attributes have 
often diverged from a single ancestral group in the past few thousand years. 
Thus, there has not been enough time for group selection to have produced the 
many attributes that distinguish one culture from another. 

It is important to understand that slow does not necessarily mean weak. 
When individual decision making is in opposition to group function in every 
group, then the relatively slow group-selection process will be too weak to favor 
group-functional behaviors. But when social interaction results in many alter¬ 
native stable social arrangements, then individual decision making maintains 
differences among groups. If the resulting variation is linked to group fitness, 
then group selection will proceed. For example, consider the response to an 
environmental change such as the opening of New Guinea to trade with Euro¬ 
peans. Initially, changes in the costs and benefits of alternative beliefs and values 
will cause rapid cultural change, soon leading to a new sociopolitical equilibrium 
in each culture. But if there are many alternative equilibria, the nature of each 
new equilibrium may depend on existing norms and values. As long as the 
resulting differences affect group survival, selection among groups will continue. 
Over a millennium or so, New Guinea societies with a better political adaptation 
to world contact will replace those with a poorer adaptation. 

Thus, it follows that these results do not preclude interpreting some aspects 
of contemporary cultures in terms of their benefit to the group. The model 
demonstrates that under the right conditions group selection can be an important 
process, and the data from New Guinea suggest that some of these conditions are 
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empirically realistic. The data also suggest that the rates of group extinction are 
high enough to cause a small number of traits with substantial effects on group 
welfare to evolve on time scales that characterize some aspects of cultural change. 
Group selection cannot explain why the many details of Enga culture differ from 
the many details of Maring culture. It might explain the existence of geographi¬ 
cally widespread practices that allow large-scale social organization in the New 
Guinea highlands, practices that evolved along with, and perhaps allowed, the 
transition from band-scale societies to the larger-scale societies that exist today. 

Cultural group selection provides a potentially acceptable explanation for 
the increase in scale of sociopolitical organization in human prehistory and 
history precisely because it is so slow. Scholars convinced of the overwhelming 
power of individual-level processes have real difficulty in explaining slow, long¬ 
term historical change. Anatomically modern humans appear in the fossil record 
about 90,000 years ago, yet there is no evidence for symbolically marked 
boundaries (perhaps indicative of a significant sociopolitical unit encompassing 
an “ethnic” group of some hundreds to a few thousand individuals) before about 
35,000 years ago (Mellars and Stringer, 1989). The evolution of simple states 
from food-producing tribal societies took about 5,000 years, and that of the 
modern industrial state took another 5,000. Evolutionary processes that lead 
to change on 10- or 100-year time scales cannot explain such slow change unless 
they are driven by some environmental factor that changes on longer time scales. 
In contrast, the more or less steadily progressive trajectory of increasing scale of 
sociopolitical complexity over the past few tens of thousands of years indeed is 
consistent with adaptation by a relatively slow process of group selection. 

These results should be interpreted with caution. It is important to re¬ 
member that we have estimated a maximum rate of change for group selection 
on the basis of the assumptions that observed differences among local groups are 
heritable and that they are persistent. Unless both assumptions are satisfied, 
group selection will be less important than our results indicate. It is also im¬ 
portant to keep in mind that we have studied only one form of group selection— 
competition among small, culturally heterogeneous groups. Other plausible 
group-selection processes might lead to more rapid change. For example, one 
cultural region may encroach upon another along a frontier, constantly capturing 
additional land and gradually expanding its domain. The Nuer and Dinka formed 
such a system before they were both overtaken by European colonists (Kelly, 
1985). In state-level societies, we have to allow for internal group selection via 
the extinction and proliferation of subgroups, such as ruling classes, interest 
groups, firms, and the like, as well as selection among states themselves (Hannan 
and Freeman, 1989). Some economists have considered business failure and 
proliferation rates sufficient to drive group selection of these units (Alchian, 
1950; Nelson and Winter, 1982). The development of collective decision¬ 
making institutions like bureaucracies and legislatures may permit group- 
functional behaviors to be deliberately adopted by state-level societies. These 
processes might act at a much faster rate than we have estimated on the basis of 
tribal institutions. 

In conclusion, these data suggest that group selection cannot explain rapid 
cultural change or the many differences between related cultures. However, they 
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also show that group selection, perhaps in concert with other processes, is a 
plausible mechanism for the evolution of widespread attributes of human soci¬ 
eties over the long run. 


APPENDIX: Time for Trait Substitution 


Assume that there are two cultural variants—deleterious and advantageous. Each 
is at a local equilibrium under the influence of within-group processes. Groups are 
connected by the mixing of individuals, and there are many such groups. Groups 
in which the advantageous variant is common never go extinct. A fraction e of the 
groups in which the deleterious variant is common suffer an extinction each gener¬ 
ation. The dynamics of this system are quite complicated because the frequency of 
advantageous variants within subpopulations in which that variant is common de¬ 
pends, to a small degree, on the frequencies of both variants in the population as 
a whole. However, if both variants are in local equilibrium, even when there is only a 
single population in which they are common, then it is roughly correct to regard the 
subpopulations as individuals and use formulas from population genetics (see Boyd 
and Richerson, 1990b for a fuller treatment). Then, if the advantageous trait is 
common in a fraction q of the groups in the region, after one generation 


(1 - q][ 1 - e) + q 

and the frequency after t generation is 


(1 - q 0 )(l - e) 1 + q 0 

Solving this for t yields 


t = 


In 


f?o(l %)' 
(1 - qo)q ti 


ln(l — e) 


which was used to generate table 11.3. 


NOTES 

We thank Philip Newman, Paul Sillitoe, Andrew Vayda, Mark Allen, and Bob 
Rechtman for help in locating data used in this analysis. Joan Silk, Timothy Earle, 
Eric Smith, Paul Allison, Lore Ruttan, Mark Jenike, Alan Rogers, Monique Bor- 
gerhoff Mulder, and an anonymous referee provided very useful comments on earlier 
drafts of this chapter. Members of the University of Bielefeld’s Center for Interdis¬ 
ciplinary Research project on the Biological Foundations of Human Culture provided 
a constructively critical audience for an early version (special thanks are due its 
director, Peter Weingart). Jonathan Turner convinced us that state-level institutions 
are different from tribal ones. 

Some authors (e.g., Harris, 1979) have suggested that the self-interested choices 
of individuals will result in group-beneficial behavior. However, this claim is not 
cogent—group-beneficial behavior will not result from individual choice except as a 
side effect of other processes or in certain limited circumstances. For example, many 
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authors have suggested that food taboos exist because they prevent overexploitation 
of ecological resources. To keep things simple, let us suppose that individuals must 
choose to observe a particular taboo or not and that individuals who observe this 
taboo forgo a satisfying and nutritious food item. Choosing to ignore the taboo has a 
positive effect on individuals’ own welfare and, by assumption, a negative effect on 
the welfare of the group. However, unless the group is very small, the personal effect 
will be much larger than the effect on the group, and thus choosing to ignore the 
taboo will better serve individuals’ goals, even if their goals include the welfare of the 
group. This effect is at the heart of both rational-strategy and evolutionary arguments 
against the easy development of group-beneficial behavior. The effect is not a matter 
of cognitive capacity, as writers such as Harris seem to imply. Rational strategists are 
assumed to have unlimited cognitive capacity, whereas evolved creatures are the 
products of blind selective sorting, but the essential problem is the same; both ra¬ 
tional strategists and evolved creatures are expected to act in their own self-interest. 

Group-beneficial behavior may result from self-interested individual choice 
under certain circumstances. First, since individual and group benefit are often cor¬ 
related, individual choice may often produce group-beneficial outcomes as a side 
effect (see Sugden, 1986, for several examples). Second, markets will lead to an 
"efficient” allocation of economic resources if the state or some other external au¬ 
thority enforces contracts, external effects such as air pollution are not present, and a 
number of other conditions are satisfied. The allocation is efficient only in the sense 
that no one can be made better off without someone else’s being made worse off— 
the distribution of wealth that results could be extremely deleterious to the survival 
of the society. Clearly, most aspects of culture are not regulated by markets or prices, 
even in contemporary societies. Third, rational planning by leaders or institutions 
may also lead to group-beneficial outcomes. While the extent to which political 
institutions can ever be modeled as acting in the common interest is debatable, it is 
clear that most aspects of culture are not the result of rational planning. Finally, 
individuals may choose group-beneficial activities if they value those activities for 
their own sake, not because they benefit the group (Margolis, 1982; Batson, 1991). 
For example, men may fight to defend the group if they value heroism in battle. 
However, one is left with explaining how men come to have such preferences— 
otherwise, the explanation is that people choose group-beneficial behaviors because 
they like to do so. Thus, we do not deny that people make group-beneficial choices. 
We are claiming that when such choices occur, they cannot be the result of mainly 
self-interested choice. 
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1 2 Group-Beneficial Norms Can 
Spread Rapidly in a 
Structured Population 


Many culturally transmitted norms are group-beneficial (Sober and 
Wilson, 1998): property rights encourage productive effort, rules against murder 
and assault encourage civil order, norms governing the filling of political offices 
reduce the chances of civil war, and product standards, building codes, and rules 
of professional conduct allow more efficient commerce. For most of human 
history, states were weak or nonexistent, and norms were not enforced by ex¬ 
ternal sanctions. Nonetheless, norms were important regulators of social order, 
and while in modern states black-letter laws also further many of the same ends 
as informal norms, the evidence is that informal custom still plays a very im¬ 
portant role in regulating behavior (Ellickson, 1991). 

The persistence of group-beneficial norms is easily explained. When people 
interact repeatedly, behavior can be rewarded or punished, and such incentives 
can stabilize almost any behavior once there is consensus about what is nor¬ 
mative. People conform to normative behavior in order to gain rewards or avoid 
punishment. The provision of rewards and punishments can be explained in 
several ways: first, if interactions are repeated indefinitely, punishing or re¬ 
warding also can be normative behaviors, and violators of that norm can be 
punished or rewarded as well (Boyd and Richerson, 1992a). Second, even if 
interactions do not go on indefinitely (or equivalently, people cannot remember 
large number of interactions), the relative disadvantage suffered by those who 
enforce social norms compared with those who do not rapidly becomes small 
as the number of interactions increases and is easily balanced by even a weak 
tendency to imitate the common type (Henrich and Boyd, 2001). (Of course, 
strong conformism can also explain the maintenance of norms without punish¬ 
ment; Boyd and Richerson, 1985.) Finally, punishment may be individually 
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beneficial if it is a costly signal of an individual’s qualities as a mate or coalition 
partner (Bleige Bird, Smith, and Bird, 2001). Several authors suggest that the 
stability of such norms explains human cultural diversity—distinct groups rep¬ 
resent alternative, stable equilibria in a complex, repeated “game of life” (Boyd 
and Richerson, 1992b; Binmore, 1994; Cohen, 2001). 

The fact that group-beneficial norms can persist does not explain why such 
norms are widely observed. While punishment and reward can stabilize group- 
beneficial norms, they can also stabilize virtually any behavior (Fundenberg and 
Maskin, 1986; Boyd and Richerson, 1992a). We can be punished if we lie or 
steal, but we can also be punished if we fail to wear a tie or refuse to eat the 
brains of dead relatives. Thus, we need an explanation for why populations 
should be more likely to wind up at a group-beneficial equilibrium than one of 
the vastly greater number of stable but non-group-beneficial equilibria. Put an¬ 
other way, if social diversity results from many stable social equilibria, then 
social evolution must involve shifting among alternative stable equilibria. Group- 
beneficial equilibria will be common only if the process of equilibrium selection 
tends to pick out group-beneficial equilibria. 

Currently, there are two different kinds of models of equilibrium selection, 
but neither provides a plausible explanation for the widespread existence of 
group-beneficial norms. 

Within-group models of equilibrium selection (Kandori, Mailath, and Rob, 
1993; Ellison, 1993; Young, 1998; Samuelson, 1997) consider the effects of 
random processes that act within groups to change the frequency of alternative 
behavioral strategies. In finite populations, sampling variation will affect patterns 
of interaction and replication, which in turn will lead to random fluctuations in 
the frequencies of types through time. As long as some mutation-like process 
acts to maintain variation, the probability that the population will be in any state 
will eventually converge to a stationary distribution. If mutation rates are low 
and populations are of reasonable size, most of the probability mass of the 
stationary distribution will pile up around the stable equilibrium of the deter¬ 
ministic dynamic model that has the largest basin of attraction. Since there is no 
necessary relationship between the size of a basin of attraction and whether it 
is group beneficial, within-group models do not predict that group-beneficial 
norms will be common. Within-group models also suffer from two other related 
problems. First, it takes a very long time for populations to shift from one 
equilibrium to another unless the number of interacting individuals is very small. 
Second, these models provide no mechanism for cumulative irreversible social 
change because populations are assumed to be in stochastic steady state, ran¬ 
domly wandering back and forth between alternative equilibria. 

Between-group models posit that equilibrium selection results from the 
competition between groups near alternative stable equilibria. These models 
assume that groups at more efficient equilibria are less likely to go extinct, or 
more able to compete with other groups in military or economic contests. This 
kind of group selection process leads to the evolution of group-beneficial equi¬ 
libria even when groups are large, and there is substantial migration between 
groups (Boyd and Richerson, 1982, 1990). However, given observed rates of 
group extinction, the spread of group-beneficial equilibria will occur too slowly 
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to account for much observed social evolution. Calculations based on empirical 
data on the social extinction of small groups in highland New Guinea suggest 
that even though rates of extinction are appreciable, the time scale for the 
substitution of one norm by a better one is on the order of a millennium (Soltis, 
Boyd, and Richerson, 1995). Moreover, these models also lack any mechanism 
that allows for the efficient recombination of group-beneficial innovations oc¬ 
curring in different groups, and thus cannot easily account for the cumulative 
nature of social change over the last 10,000 years. 

Here, we show that when the standard replicator dynamic model of evolu¬ 
tionary game theory is embedded in a spatially structured population, group- 
beneficial equilibria can spread rapidly and innovations can readily recombine to 
form beneficial new combinations. The basic logic of this result is simple: evolu¬ 
tionary game theory is applicable to human social evolution when behavioral 
strategies are transmitted by imitation, and people who have achieved high payoffs 
are most likely to be imitated. Strategies that have high average payoffs will in¬ 
crease in frequency, in most cases eventually leading to a stable evolutionary 
equilibrium state. If the payoff structure of social interactions leads to multiple 
stable equilibria and a population is structured, partially isolated groups can be 
stabilized at different equilibria with different average payoffs. Consequently, be¬ 
haviors can spread from groups at high payoff equilibria to neighboring groups at 
lower payoff equilibria because people imitate their more successful neighbors. 
Such spread can be rapid because it depends on the rate at which individuals 
imitate new strategies, rather than the rate at which groups become extinct. 

In what follows, we first derive the dynamic equations that govern replicator 
dynamics in a spatially structured population. We then show that these equa¬ 
tions can lead to the rapid spread of group-beneficial traits under plausible con¬ 
ditions. Finally, we show that this process readily leads to the recombination of 
different group-beneficial traits that arise in different populations. 


Replicator Dynamics in a Structured Population 

In many situations, people have important social interactions shaped by social 
norms with one group of people but know about the behavior, and the norms 
that regulate it, of a larger group of people. People interact every day with the 
members of their local group—they exchange food, labor, and land; aid others 
in need; marry and care for children—transactions that are regulated by social 
norms that define property rights and moral obligations. However, people also 
often know about the behavior of others in neighboring groups. They know that 
we can marry our cousins here, but over there they cannot; or anyone is free 
to pick fruit here, while there fruit trees are owned by individuals. With this kind 
of population structure, payoffs are determined by the composition of the local 
group, but cultural traits can diffuse among groups. 

To generalize evolutionary game theory to allow for this kind of popula¬ 
tion structure, consider a population that is subdivided into n large groups in 
which frequent social interaction occurs. Individuals are characterized by one of 
k strategies. The proportion of people in group d who have strategy i is p t j, and 
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the vector of frequencies in group d is pj. Social interaction generates a payoff, 
Wj (p^) for individuals with behavior i in group d that depends on individuals’ 
own strategy and the strategies of other members of their group because fre¬ 
quent social interaction occurs with other group members. 

To allow for the possibility of cultural diffusion between groups, we adopt 
the following model of cultural transmission: during each time period, each 
individual from group / encounters an individual, their “model,” from group d 
with probability m^f and observes that individual’s strategy and payoff from 
social interaction during that period. We will assume that mg > so that 

most encounters occur within social groups. After the encounter, individuals 
may imitate the strategy of their model. 

We assume that individuals are more likely to imitate if their model has a 
higher payoff than they do. More formally, if an individual with behavior i from 
group / encounters an individual with behavior j from group d, individual i 
switches to j with this probability: 

PrtiM =|(1+ /TOPrf) - 1+dP/))) (1) 

where /l is a positive parameter that scales payoffs so that 0 < Pr[j\i,j] < 1 for 
all pd and p f. Equation (1] implies that individuals sometimes switch to a lower 
payoff strategy, unlike some recent derivations of replicator dynamics (Borgers 
and Sarin, 1997; Schlag, 1998; Gale, Binmore, and Samuelson, 1995). We think 
this model is preferable because it captures the effect of uncertainty about the 
payoffs of others, and because it allows diffusion between groups even when 
there are no payoff differences, a conservative feature that reduces the effect of 
population structure. 

Then the frequency of behavior i in group /, p'if, after one time period is 
given by equation (2): 

P'if = + (W'tP/O “ W i(Pd)J) 

d Li 

+pidY,pif\v +/wM - w p/D) p) 

j 

The first sum inside the square brackets gives the probability that an individual with 
trait i in group / remains the same, and the second sum gives the probability that 
someone who is not i initially converts to i. Some algebraic manipulation yields the 
following expression for the change in the frequency of behavior i in population /: 

r 

n Y 

p'if ~ Pif = 5 Pif 1 - 7 m df 

drf 

+ E^[^ + LPid-PifX 1 + P(W[p d ) - WCp/)))] (3) 

d# 

where Spif = jipjf (VE,(p/) — W(p y)) is the replicator dynamic equation for strategy 
i in group / and is the canonical description of strategy dynamics in evolutionary 
game theory. Thus, when individuals imitate only members of their own group 
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[m d f= 0 , d ^ /), equation (3) says that imitation within each group causes be¬ 
haviors with the highest payoff relative to others in the group to increase in 
frequency—effects on average payoff within a group are irrelevant. When there 
is contact between different groups, however, the effect of a behavior on average 
group payoff can become important. The second term in equation (3) includes 
the effect of diffusion between groups that differ in trait frequency. When 
payoffs do not effect imitation (/l = 0], this term includes only passive diffusion. 
However, when individuals with higher payoffs are more likely to be imitated, 
there is a net flow of strategies from groups with high average payoff to groups 
with lower average payoff. 


How Group-Beneficial Equilibria Spread 

Next, we show how this effect can lead to the spread of group-beneficial equi¬ 
libria. Consider a simple model in which there are two strategies, 1 and 2. For 
example, strategy 1 might be a norm forbidding cousin marriage, while strategy 
2 is the norm allowing free choice of a spouse. Within each group, individuals 
who deviate from the common norm suffer because they are punished by other 
group members. The norm forbidding cousin marriage might lead to higher 
average payoff due to the formation of wider political alliances. We formalize 
these ideas by assuming that the payoff to an individual with behavior 1 in group 
d is W\[pu) = 1 + s[pi 4 ~p) + gp\d and the payoff to an individual using be¬ 
havior 2 is W 2 ^p\d)= 1 + gpid- Thus, each strategy has a higher relative payoff 
when common. The unstable equilibrium that divides the two basins of attrac¬ 
tion is p. The parameter 5 measures the magnitude of the difference in payoffs 
of the two strategies, and g measures the effect of behavior 1 on average payoff. 
We assume that g > 0, so that groups in which behavior 1 is common have higher 
average payoff. For example, a norm against cousin marriage might lead to more 
alliance formation among clans within the group. Finally, for simplicity, we as¬ 
sume that social groups are arranged in a ring so individuals imitate only 
members of their own group and the two neighboring groups. (So that mdf= m 
for the two neighbors of group / and zero otherwise.) 

For a novel group-beneficial trait to evolve, two things must occur. First, it 
must become common in one population, and second it must spread from that 
population to others. Various random processes may cause the initial shift of one 
population to the group-beneficial equilibrium. In finite populations, sampling 
variation in who is imitated (Gale et al., 1995) or in patterns of interaction (Kandori 
et al., 1993; Ellison, 1993; Young, 1998) can lead to random fluctuations in trait 
frequencies that can tip populations into the basin of attraction of the group- 
beneficial equilibrium. Randomly varying environments can lead to similar shifts 
(Price, Turelli, and Slatkin, 1993) in populations. Finally, individual learning can 
be conceptualized as a process in which individuals use data from the environ¬ 
ment to infer the best behavior. Learning experiences of individuals within a 
population may often be correlated, because they are utilizing the same data. 
Thus, random variation in such correlated learning experiences could also cause 
equilibrium shifts in large populations. We do not model these processes here. 
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To see how imitation of the successful can lead to the spread of group- 
beneficial strategies, assume that one of these unmodeled processes causes the 
group-beneficial strategy to become common in one group, while the other 
strategy remains common in the rest of the groups. Then, if enough individuals 
in the two neighboring groups imitate behavior 1, these groups will be tipped 
into its basin of attraction, and the group-beneficial trait will increase in those 
two groups. This process is illustrated in figure 12.1. Trait 1 is initially common 
in population i — 1. In the neighboring population i, trait 2 is common, and thus 
within-group imitation tends to decrease the frequency of trait 1. However, 
individuals in population i are more likely to imitate individuals in population 
i — 1 than in population i+ 1, so extra-group imitation tends to increase the 
frequency of trait 1 in group i. If this latter process is sufficiently strong, it can tip 
population i into trait l’s basin of attraction. If this occurs, the process will be 
repeated in group i + 1, then group i + 2, and so on, with behavior 1 spreading 
throughout the population in a wave-like fashion. This process is formally 
similar to one recent model of the third phase of Wright’s shifting balance theory 
(Gavrilets, 1995], but is unlike that model in two ways. First, the underlying 
dynamic processes arise from differential imitation, not changes in demography. 
Second, because the multiple equilibria arise from frequency-dependent social 
interaction, not underdominance, the process modeled here leads to the spread 
of the group-beneficial trait for a wide range of parameters (figure 12.2). 

It is important to see that the spread of the group-beneficial trait depends 
crucially on the assumption that people imitate strategies that lead to success in 
neighboring groups, but will lower their payoff in their own group where dif¬ 
ferent norms are enforced. In this simple model, a type that restricted imitation 
to its own group would replace the type of imitation assumed here. We think 
our assumption is plausible nonetheless. Empirically, the tendency to imitate the 
successful has been observed in a wide variety of contexts (see Henrich and Gil- 
White, 2001). This tendency makes sense adaptively. The world is complex and 
hard to understand. It is very difficult in many situations to connect behavior to 
outcomes with much confidence. An individual observes that in the neighboring 
group they never marry cousins and that they are much better off. His neighbors 
say that the gods punish those who marry cousins, and they have had much 
greater success in warfare lately. Of course, the individual knows that it will 
cause trouble to forbid a marriage that both his daughter and his brother want, 
but maybe it will be worth it. The same kinds of uncertainties beset us in the 
modern world despite vastly greater information-gathering capacity. In the early 
1990s it was commonplace to attribute Japan’s economic success to encour¬ 
agement of long-term investment, their “just in time” inventory practices, or to 
their quality circles, and all of these practices were imitated by American firms 
and policy makers. We have argued at length (Boyd and Richerson, 1985) that 
cultural transmission rules like imitate the successful and imitate the common type 
should be seen as adaptations for dealing with this kind of uncertainty. We have a 
propensity to imitate the successful because it is often very difficult to decide 
what is the best behavior. These learning rules are shortcuts that on average 
allow us to acquire lots of useful information but may, as in the model in this 
chapter, sometimes lead us astray. 
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Figure 12.1 . This graph illustrates the assumed payoff structure and why it can lead to 
the spread of group-beneficial traits. The top panel plots the payoffs to traits 1 and 2 as a 
function of the frequency of trait 1 in their local group. Each trait has a higher relative 
payoff when it is common, but increasing the frequency of trait 1 raises the payoff of all group 
members. As a result, within-group imitation increases the frequency of trait 1 above the 
threshold frequency p and increases the frequency of trait 2 below that threshold. The 
lower panel shows the state of a part of a population in which trait 1 is initially common in 
group i — 1 and trait 2 is common in all other groups. In group i, individuals are more 
likely to imitate people in population i — 1 than in population i + 1 because the former 
have higher payoffs than the latter. Thus, between-group imitation tends to increase the 
frequency of trait 1 in population i. If this effect is strong enough, it can tip group i into the 
basin of attraction of trait 1 and cause the spread of this group-beneficial trait. 


Figure 12.2 plots combinations of the parameters m, s, p, and g that lead to the 
spread of the group-beneficial strategy. It indicates that the group-beneficial 
strategy fails to spread under three circumstances. If there is too much mixing 
between neighboring groups, the beneficial strategy cannot persist in the initial 
population; it is swamped by the flow of behavior 2 from the neighboring groups. 
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Figure 12.2. This graph shows the range of parameters over which the beneficial norm 
spreads to all groups, eliminating the alternative norm, given that the beneficial norm is 
initially common in a single group. The vertical axis gives the ratio of m, the probability that 
individuals interact with others from one of the neighboring groups, to s, the rate of change 
due to imitation within groups. The horizontal axis plots p, the unstable equilibrium that 
separates the basins of attraction of group-beneficial and nongroup-beneficial equilibria 
in isolated groups. The shaded areas give the combinations of m/s and p that lead to the 
spread of the group-beneficial strategy for three values of g. When g = 0, neither norm is 
group-beneficial. Larger values of g mean that the group-beneficial norm leads to a greater 
increase in average payoff. When m is small, the group-beneficial norm cannot spread 
because there is not enough interaction between neighbors for the beneficial effects of 
the norm to cause it to spread. Very large values of m prevent the spread of the group- 
beneficial norm because it cannot persist in the initial population. If the domain of 
attraction of the group-beneficial strategy is too small, the flow of strategies from suc¬ 
cessful groups to less successful groups does not tip neighboring groups into its basin of 
attraction. Increasing the degree to which strategy 1 is group-beneficial (i.e., the magnitude 
of g) enlarges the range of parameters that lead to the increase in that strategy. Here, 
the number of groups, re, was 32, but results are insensitive to re as long as it is sufficiently 
large. Very small values of re increase the range of parameters under which the group- 
beneficial trait spreads. These results are from simulation—if the group-beneficial trait 
had not spread to all groups after 10,000 time periods, we assumed it would not spread. 
To construct the graph, we chose values of m/s and then used an interval-halving algorithm 
to find the threshold value of p at which trait 1 did not spread. 
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If there is too little mixing, the group-beneficial behavior remains common in the 
initial population but cannot spread because there is not enough interaction be¬ 
tween neighbors for the beneficial effects of the norm to cause it to spread. If 
the domain of attraction of the group-beneficial strategy is too small, the flow of 
ideas from successful groups to less successful groups may not be sufficient to tip 
neighboring groups into its basin of attraction. Increasing the degree to which 
strategy 1 is group-beneficial [i.e., the magnitude of g ) enlarges the range of 
parameters that lead to the increase in that strategy. 

The results plotted in figure 12.3 show that the group-beneficial trait 
spreads at a rate that is roughly comparable with the rate at which individually 



P 


Figure 12.3. This figure plots a measure of the length of time necessary for the spread of 
the group-beneficial trait relative to the length of time necessary for the spread of an 
individually advantageous trait. In the simulations reported, the group-beneficial trait 
spreads from one group to the next at a constant rate after an initial transient period. Here, 
we plot the ratio of the time necessary to increase from a frequency of 0.1 to 0.9 in a 
single group at the boundary of the wave spreading at the constant rate divided by the 
length of time necessary for a purely advantageous trait with dynamics \p = sp{\ — p) to 
spread from 0.1 to 0.9 in a single isolated population for two different values of the ratio m/s. 
As in figure 12.1, m is the probability of interacting with, and potentially imitating, an 
individual in each of the two neighboring groups. In both graphs, g= 1.0, and the parameter p 
is the unstable equilibrium that divides the basins of attraction of the group-beneficial 
trait and the other trait. These results indicate that spatial structure causes an initially 
individually disadvantageous but group-beneficial trait to spread on roughly the same 
time scale as a simple individually advantageous trait whose within-group dynamics are 
governed by the same rate parameter s. 
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beneficial traits spread within a single group under the influence of the same 
learning process. Thus, if an individually beneficial trait can spread within a 
population in 10 years, a group-beneficial trait will spread from one population 
to the next in 15-30 years, depending on the amount of mixing and the effect of 
the trait on average fitness. Game theorists have considered a number of mech¬ 
anisms of equilibrium selection that arise because of random fluctuations in 
outcomes due to sampling variation and finite number of players (Kandori et al., 
1993; Ellison, 1993; Young, 1998; Samuelson, 1997). These processes tend to 
pick out the equilibrium with the largest domain of attraction. However, unless 
spatial structure limits interactions to a small number of individuals, the rate at 
which this occurs in a large population is very slow. Similarly, group selection 
models appear to require unrealistically high group extinction rates to explain 
many examples of the spread of group-beneficial cultural traits (Boyd and 
Richerson, 1990; Soltis et al., 1995). In contrast, the process we describe here 
leads to the deterministic spread of the group-beneficial trait on roughly the 
same time scale as the same social learning processes cause individually beneficial 
traits to spread within groups. 

Of course, we have not accounted for the processes that influence the rate at 
which the beneficial behavior initially becomes common in a particular group. 
However, if the conditions for spread are satisfied, the group-beneficial trait 
needs to become common only in a single group. If we imagine that group- 
beneficial traits mainly arise as a result of random processes in small populations, 
only the initial group, not the whole population, needs to be small, and the group 
must remain small only for long enough for random processes to give rise to an 
initial “group mutation,” which can then spread relatively rapidly to the pop¬ 
ulation as a whole. If we imagine that rare events, such as the emergence of 
uniquely charismatic reformers or alignment of the particular constellations of 
political forces, are required to affect a group-favoring innovation, the same 
considerations apply. Only one group need make the original innovation; any 
others with substantial cultural contact can rapidly acquire the trait by the 
mechanism we model here. 


Recombination at the Group Level 

The process described here readily leads to the recombination of group- 
beneficial strategies that initially arise in different groups. The exact combi¬ 
nation of strategies necessary to support complex, adaptive social institutions 
would seem unlikely to arise through a single chance event. It is much more 
plausible that complex institutions are assembled in numerous small steps. 
Previous group selection models of equilibrium selection are analogous to the 
evolution of an asexual population in that they lack any mechanism that allows 
the recombination of beneficial strategies that arise in different populations 
and thus require innovations to occur sequentially in the same lineage. Within- 
group models in which equilibrium selection occurs through random sampling 
processes assume that the population has reached a stationary distribution, 
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and thus while recombination is possible, there is no cumulative, irreversible 
change. By contrast, the present model allows recombination of different 
strategies and irreversible, cumulative change. To see this, consider a model 
in which strategies consist of two components (x, y], each with two values 


a. 



Figure 12.4. In (a), (b), and (c), the upper graph plots the frequencies of the four possible 
strategies as stacked bar graphs for each of 32 groups: [0, 0] white, (1, 0) light gray, (0, 1) 
dark gray, and (1,1) black. The lower graph plots the payoff to each strategy net of the 
group effects in each group. The (—) line gives the payoff of (0, 0} and the ( • • • ) circles 
give the payoffs of the other three strategies. The parameters are m = 0.02, s = 0.1, p = 0.4, 
and g= 2. (a) Initially (0, 1) is common in group 8 and (1, 0) is common in group 24, and 
the two group-beneficial traits begin to spread, (b) When the two spreading fronts meet, 
the frequencies of x= 1 andy= 1 are one half, which means that the strategy (1, 1) has the 
highest payoff, (c) Recombination at the individual level introduces strategy (1, 1} into the 
boundary group, and strategy (1, 1) then spreads deterministically, first in that group and 
then to adjacent groups. 
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(0, 1). Let pd and qj be the frequencies of x = 1 and y = 1 in group d, re¬ 
spectively. Let the payoff of an individual in group d be as follows: 

W d (?c,y) = 1 + sx[p d ~ P) 

+ syfad ~ P) 

+ gfad + Pd) (4) 

Thus, both x=\ and y = 1 have an independent group-beneficial effect, and all 
four combinations of x and y can be stable equilibria in isolated groups. Finally, 
suppose that individuals occasionally learn the x component of their strategy 
from one individual and the y component from another, leading to recombination 
of behavioral strategies at the individual level. Once again suppose that the 
population is initially all strategy (0, 0), and that random shocks cause (1, 0) to 
become common in one population and (0, 1) common in a second population. 
Then, if conditions are right, both strategies will begin to spread (figure 12.4[a]). 
When the two waves meet, the frequency of x = 1 is equal to one half and the 
frequency of y = 1 is equal to one half at the boundary between the two ex¬ 
panding fronts. The outcome depends on the value of p. If p <i the strategy 
(1, 1] has the highest payoff in the group on the boundary, increases deter¬ 
ministically in that group, and eventually spreads throughout the population as a 
whole (figure 12.4[b]). If p >b the strategy (1, 1) has a lower payoff than (1, 0} 
or (0, 1), and the two waves form a stable boundary. However, in the boundary 
group, the most beneficial combination, (1, 1), has a relatively small payoff 
disadvantage compared with (0, 1), and (0, 1) is present at substantial frequency. 
In this situation, a shift to the most beneficial combination due to random shocks 
is much more likely than the shifts that were necessary to cause (0, 1) and (1,0) 
to become common in the first place. Thus, existing group-beneficial traits will 
recombine more rapidly than new ones arise. 


Conclusion 

Many anthropologists and sociologists have long believed that human behavior is 
regulated by culturally transmitted norms in ways that promote the survival and 
growth of human societies. Economists and other rational choice theorists have 
been skeptical about such functionalist claims because there was no plausible 
mechanism to explain why such norms should be common. Social scientists 
influenced by evolutionary biology tend to share this skepticism based upon 
theoretical models and empirical findings suggesting that group selection is 
generally a weak force in nature. We believe that humans are an exception to this 
rule because cultural variation is much more susceptible to group selection than 
genetic variation. The cultural group selection hypothesis explains both why 
humans cooperate on such a large scale and why the pattern of this cooperation 
is so different from that of other ultrasocial animals (Richerson and Boyd, 1999). 
Human societies are based upon cooperation between nonrelatives, while kin¬ 
ship underlies cooperation and complex sociality in other taxa like the social 
insects. 
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Despite a general fit between the existing models of cultural group selection 
and the facts of human sociality, much uncertainty remains. Earlier work sug¬ 
gests that the differential survival of culturally distinctive groups can lead to the 
evolution of group-beneficial behavior under plausible circumstances, but that 
this process is quite slow and likely to produce historically contingent group- 
level adaptations (Boyd and Richerson, 1982, 1990; Soltis et ah, 1995). Since the 
evolution of human social institutions does have a time scale of millennia and the 
resulting institutions are highly variable, such group selection processes may 
have had a role in shaping these institutions. On the other hand, some social 
institutions do diffuse from one society to another and on time scales shorter 
than a millennium. The spread of the joint stock company on time scales of 
a century is a recent example. Such events accord better with a mechanism like 
the one we model here. 

We suspect that both differential survival and differential diffusion may 
affect the evolution of human social institutions. The operation of many social 
institutions is opaque even to the people who enact them (Nelson and Winter, 
1982, ch. 5), and such institutions are even harder for outsiders to understand. In 
such cases, diffusion may be ineffective because actors cannot connect the attri¬ 
butes of particular institutions to their success, and this fact may explain why the 
path from the origins of agriculture to our complex modern industrial nations 
took some 10 millennia to traverse. Other institutions spread much more readily 
because their costs and benefits are more readily understood. Proselytizing re¬ 
ligions, for example, take pains to be transparent to potential converts and thus 
may readily spread. The rate of diffusion of institutions may also be affected by 
how much people know about other societies. It is plausible that the spread of 
literacy and the development of ever better means of transportation have 
gradually increased the importance of the rapid processes based on borrowing 
relative to the slower ones based on group extinction. In the twentieth century, 
social institutions like central banks, soccer, and government bureaucracies have 
become all but universal in about a century. Nevertheless, globalization is in¬ 
complete; dramatic differences exist even between modern societies (Nisbett, 
Peng, Choi, and Norenzayan, 2001). Some elements of culture likely still have 
time scales of change measured in millennia. 
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We thank Sam Bowles, Ernst Fehr, Daniel Friedman, Francisco Gil-White, Herb 
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13 The Evolution of Altruistic 
Punishment 

With Herbert Cintis and Samuel Bowles 


Unlike any other species, humans cooperate with nonkin in large 
groups. This behavior is puzzling from an evolutionary perspective because 
cooperating individuals incur individual costs to confer benefits on unrelated 
group members. None of the mechanisms commonly used to explain such be¬ 
havior allows the evolution of altruistic cooperation in large groups. Repeated in¬ 
teractions may support cooperation in dyadic relations (Axelrod and Hamilton, 
1981; Trivers, 1971; Clutton-Brock and Parker, 1995), but this mechanism is 
unsustainable if the number of individuals interacting strategically is larger than a 
handful (Boyd and Richerson, 1998). Interdemic group selection can lead to the 
evolution of altruism only when groups are small and migration is infrequent 
(Sober and Wilson, 1998; Eshel, 1972; Aoki, 1982; Rogers, 1990). A third re¬ 
cently proposed mechanism (Hauert, De Monte, Hofbauer, and Sigmund, 2002) 
requires that asocial, solitary types outcompete individuals living in uncooper¬ 
ative social groups, an implausible assumption for humans. 

Altruistic punishment provides one solution to this puzzle. In laboratory 
experiments, people punish noncooperators at a cost to themselves even in one- 
shot interactions (Fehr and Gachter, 2002; Ostrom, Gardner, and Walker, 
1994), and ethnographic data suggest that such altruistic punishment helps to 
sustain cooperation in human societies (Boehm, 1993). It might seem that in¬ 
voking altruistic punishment simply creates a new evolutionary puzzle: why do 
people incur costs to punish others and provide benefits to nonrelatives? How¬ 
ever, here we show that group selection can lead to the evolution of altruistic 
punishment in larger groups because the problem of deterring free riders in the 
case of altruistic cooperation is fundamentally different from the problem of 
deterring free riders in the case of altruistic punishment. This asymmetry arises 
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because the payoff disadvantage of altruistic cooperators relative to defectors is 
independent of the frequency of defectors in the population, whereas the cost 
disadvantage for those engaged in altruistic punishment declines as defectors 
become rare because acts of punishment become very infrequent (Sethi and 
Somanathan, 1996}. Thus, when altruistic punishers are common, individual- 
level selection operating against them is weak. 

To see why, consider a model in which a large population is divided into 
groups of size n. There are two behavioral types, contributors and defectors. 
Contributors incur a cost (c) to produce a total benefit (fc) that is shared equally 
among group members. Defectors incur no costs and produce no benefits. If the 
fraction of contributors in the group is x, the expected payoff for contributors is 
bx — c and the expected payoff for defectors is bx, so the payoff disadvantage of 
the contributors is a constant c independent of the distribution of types in the 
population. Now add a third type, “punishers,” who cooperate and then punish 
each defector in their group, reducing each defector’s payoff by p/n at a cost kin 
to the punisher. If the frequency of punishers is y, the expected payoffs become 
b(x+y] — c to contributors, b(x + y) — py to defectors, and b(x + y) — c — k(l — 
x — y) to punishers. Contributors have higher fitness than defectors if punishers 
are sufficiently common that the cost of being punished exceeds the cost of co¬ 
operating ( py>c ]. Punishers suffer a fitness disadvantage of k(l — x — y] com¬ 
pared with nonpunishing contributors. Thus, punishment is altruistic and mere 
contributors are “second-order free riders.” Note, however, that the payoff 
disadvantage of punishers relative to contributors approaches zero as defectors 
become rare because there is no need for punishment. In a more realistic model 
(like the one we show}, the costs of monitoring or punishing occasional mistaken 
defections would mean that punishers have slightly lower fitness than contrib¬ 
utors and that defection is the only one of these three strategies that is an 
evolutionarily stable strategy in a single isolated population. However, the fact 
that punishers experience only a small disadvantage when defectors are rare 
means that weak within-group evolutionary forces, such as mutation (Sethi and 
Somanathan, 1996} or a conformist tendency (Henrich and Boyd, 2001}, can 
stabilize punishment and allow cooperation to persist. But neither produces a 
systematic tendency to evolve toward a cooperative outcome. Here we explore 
the possibility that selection among groups leads to the evolution of altruistic 
punishment when it could not maintain altruistic cooperation. 

Suppose that more cooperative groups are less prone to extinction. Humans 
always live in social groups in which cooperative activities play a crucial role. In 
small-scale societies, such groups frequently become extinct (Soltis, Boyd, and 
Richerson, 1995). It is plausible that more cooperative groups are less subject to 
extinction because they are more effective in warfare, more successful in coin¬ 
suring, more adept at managing common resources, or for similar reasons. This 
means that, all other things being equal, group selection will tend to increase the 
frequency of cooperation in the population. Because groups with more punishers 
will tend to exhibit a greater frequency of cooperative behaviors (by both con¬ 
tributors and punishers), the frequency of punishers and cooperative behaviors 
will be positively correlated across groups. As a result, punishment will increase 
as a “correlated response” to group selection that favors more cooperative 
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groups. Because selection within groups against punishment is weak when pun¬ 
ishment is common, this process might support the evolution of substantial levels 
of punishment and maintain punishment once it is common. 

To evaluate this intuitive argument we studied the following model using 
simulation methods. There are N groups. Local density-dependent competition 
maintains each group at a constant population size n. Individuals interact in a 
two-stage “game.” During the first stage, contributors and punishers cooperate 
with probability 1 — e and defect with probability e. Cooperation reduces the 
payoff of cooperators by an amount c and increases the ability of the group to 
compete with other groups. For simplicity, we begin by assuming that cooper¬ 
ation has no effect on the individual payoffs of others, but does reduce the 
probability of group extinction. Defectors always defect. During the second 
stage, punishers punish each individual who defected during the first stage. After 
the second stage, individuals encounter another individual from their own group 
with probability 1 — m and an individual from another randomly chosen group 
with probability m. An individual i who encounters an individual j imitates j with 
probability W/{Wj+ IT)), where W x is the payoff of individual x in the game, 
including the costs of any punishment received or delivered. Thus, imitation has 
two distinct effects: first, it creates a selection-like process that causes higher 
payoff behaviors to spread within groups. Second, it creates a migration-like 
process that causes behaviors to diffuse from one group to another at a rate pro¬ 
portional to m. Because cooperation has no individual-level benefits, defectors 
spread between groups more rapidly than do contributors or punishers. Group 
selection occurs through intergroup conflict (Bowles, 2001). In each time period, 
groups are paired at random, and with probability e, intergroup conflict results in 
one group defeating and replacing the other group. The probability that group i 
defeats group is 1/2(1 + (d } — <f;)), where d q is the frequency of defectors in 
group q. This means that the group with more defectors is more likely to lose a 
conflict. Note that cooperation is the sole target of the resulting group selection 
process; punishment increases only to the extent that the frequency of punishers 
is correlated with that of cooperation across groups. Finally, with probability fi 
individuals of each type spontaneously switch into one of the two other types. 
Mutation and erroneous defection ensure that punishers will incur some pun¬ 
ishment costs, even when they are common, thus placing them at a disadvantage 
with respect to the contributors. 


Methods 

Two simulation programs implementing the model were independently written, 
one by R. B. in Visual Basic, and a second by H. G. in Delphi. Code is available 
on request. Results from the two programs are highly similar. In all simulations 
there were 128 groups. Initially one group consisted of all altruistic punishers 
and the other 127 groups were all defectors. Various random processes could 
cause such an initial shift. Sampling variation in who is imitated (Gale, Binmore, 
and Samuelson, 1995) could increase the frequency of punishers. Randomly 
varying environments can lead to similar shifts (Price, Turelli, and Slatkin, 1993) 
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in populations. Finally, individual learning can be conceptualized as a process 
in which individuals use data from the environment to infer the best behavior. 
Learning experiences of individuals within a population may often be correlated 
because they are using the same data. Thus, random variation in such correlated 
learning experiences could also cause equilibrium shifts in large populations. We 
do not model these processes here. Simulations were run for 2,000 time periods. 
The long-run average results plotted in figures 13.1-13.4 represent the average 
of frequencies over the last 1,000 time periods of 10 simulations. 

Base case parameters were chosen to represent cultural evolution in small- 
scale societies. We set the time period to be 1 year. Because individually bene¬ 
ficial cultural traits, such as technical innovations, diffuse through populations 
in 10-100 years (Rogers, 1983}, we set the cost of cooperation, c, and punishing, 
k, so that traits with this cost advantage would spread in 50 time periods 



Figure 13.1. The evolution of cooper¬ 
ation is strongly affected by the pres¬ 
ence of punishment, [a] The long-run 
average frequency of cooperation (i.e., 
the sum of the frequencies of con¬ 
tributors and punishers] as a function 
of group size when there is no pun¬ 
ishment [p = k= 0] for three different 
conflict rates, 0.075, 0.015, and 0.003. 
Group selection is ineffective unless 
groups are quite small, (b) When there 
is punishment (p = 0.8, k = 0.2), group 
selection can maintain cooperation in 
substantially larger groups. 
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Figure 13.2. The evolution of cooper¬ 
ation is strongly affected by rate of 
mixing between groups, [a] The long- 
run average frequency of cooperation 
(i.e., the sum of the frequencies of 
contributors and punishers) as a func¬ 
tion of group size when there is no 
punishment (p = k = 0 ) for three mix¬ 
ing rates, 0.002, 0.01, and 0.05. Group 
selection is ineffective unless groups 
are quite small, (b) When there is 
punishment [_p = 0.8, k = 0 . 2 ), group 
selection can maintain cooperation in 
larger groups for all rates of mixing. 
However, at higher rates of mixing, 
cooperation does not persist in the 
largest groups. 


[c=k= 0.2). To capture the intuition that in human societies punishment is 
more costly to the punishee than to the punisher, we set the cost of being 
punished to four times the cost of punishing (p = 0.8). We assume that erro¬ 
neous defection is relatively rare (e = 0.02). The migration rate, m, was set so 
that in the absence of any other evolutionary forces (i.e., c=p = k = e = E = 0), 
passive diffusion will cause two neighboring groups that are initially as different 
as possible to achieve the same trait frequencies in es 50 time periods (m = 0.01), 
a value that approximates the migration rates in a number of small-scale societies 
(Harpending and Rogers, 1986). We set the value of the mutation rate so that 
the long-run average frequency of an ordinary adaptive trait with payoff ad¬ 
vantage c is w0.9 (/.1 = 0.01). This means that mutation maintains considerable 
variation, but not so much as to overwhelm adaptive forces. We assume that the 
average group extinction rate is consistent with a recent estimate of cultural 
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Figure 13.3. The evolution of cooper¬ 
ation is sensitive to the cost of being 
punished (p). Here we plot the long- 
run average frequency of cooperation 
with the base case cost of being pun¬ 
ished (p = 0.8) and with a lower 
value of p. Lower values of p result 
in much lower levels of cooperation. 



extinction rates in small-scale societies, w0.0075 (Soltis et al., 1995). Because 
only one of the two groups entering into a conflict becomes extinct, this implies 
that £ = 0.015. 


Results 

Simulations using this model indicate that group selection can maintain altruistic 
punishment and altruistic cooperation over a wider range of parameter values 
than group selection will sustain altruistic cooperation alone. Figure 13.1 com¬ 
pares the long-run average levels of cooperation with and without punishment for 
a range of group sizes and extinction rates. If there is no punishment, our simu¬ 
lations replicate the standard result: group selection can support high frequencies 


Figure 13.4. Punishment does not aid 
in the evolution of cooperation when 
the costs born by punishers are fixed, 
independent of the number of defectors 
in the group. Here we plot the long-run 
average frequency of cooperation when 
the costs of punishing are proportional 
to the frequency of 
defectors (variable cost), fixed at a 
constant cost equal to the cost of 
cooperating (c), and when there is 
no punishment. 


-■-No Punishment 
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of cooperative behavior only if groups are quite small. However, adding pun¬ 
ishment sustains substantial amounts of cooperation in much larger groups. As 
one would expect, increasing the rate of extinction increases the long-run average 
amount of cooperation. 

In this model, group selection leads to the evolution of cooperation only if 
migration is sufficiently limited to sustain substantial between-group differences 
in the frequency of defectors. Figure 13.2 shows that when the migration rate 
increases, levels of cooperation fall precipitously. When punishers are common, 
defectors do badly, but when punishers are rare, defectors do well. Thus, the 
imitation of high payoff individuals creates a selection-like adaptive force that 
acts to maintain variation between groups in the frequency of defectors. How¬ 
ever, if there is too much migration, this process cannot maintain enough vari¬ 
ation between groups for group selection to be effective. 

The long-run average amount of cooperation is also sensitive to the cost of 
being punished (figure 13.3). When the cost of being punished is at base case 
value [p = 4k], even a modest frequency of punishers will cause defectors to 
be selected against, and, as a result, there is a substantial correlation between the 
frequency of cooperation and punishment across groups. When the cost of being 
punished is twice the cost of cooperation (p = 2k), punishment does not suffi¬ 
ciently reduce the relative payoff of defectors, and the correlation between the 
frequency of cooperators and punishers declines. Lower correlations mean that 
selection among groups cannot compensate for the decline of punishers within 
groups, and eventually both punishers and contributors decline. 

It is important to see that punishment leads to increased cooperation only 
to the extent that the costs associated with being a punisher decline as defectors 
become rare. Monitoring costs, for example, must be paid whether or not there 
are any defectors. When such costs are substantial, or when the probability of 
mistaken defection is high enough that punishers bear significant costs even 
when defectors are rare, group selection does not lead to the evolution of al¬ 
truistic punishment (figure 13.4). However, because people live in long-lasting 
social groups and language allows the spread of information about who did what, 
it is plausible that monitoring costs may often be small compared with en¬ 
forcement costs. This result also leads to an empirical prediction: people should 
be less inclined to pay fixed than variable punishment costs if the mechanism 
outlined here is responsible for the psychology of altruistic punishment. 

Further sensitivity analyses suggest that these results are robust. In addition 
to the results described, we have studied the sensitivity of the model to varia¬ 
tions in the remaining parameter values. Decreasing the mutation rate sub¬ 
stantially increases the long-run average levels of cooperation. Random drift-like 
processes have an important effect on trait frequencies in this model. Standard 
models of genetic drift suggest that lower mutation rates will cause groups to 
stay nearer the boundaries of the state space (Crow and Kimura, 1970), and our 
simulations confirm this prediction. Increasing mutation rate, on average, in¬ 
creases the amount of punishment that must be administered and therefore 
increases the payoff advantage of second-order free riders compared with al¬ 
truistic punishers. Increasing e, the error rate, reduces the long-run average 
amount of cooperation. Reducing the number of groups, N, adds random noise 
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to the results. We also tested the sensitivity of the model to three structural 
changes. We modified the payoffs so that each cooperative act produces a per 
capita benefit of bln for each other group member and modified the extinction 
model so that the probability of group extinction is proportional to the differ¬ 
ence between warring groups in average payoffs including the costs of punish¬ 
ment, rather than simply the difference in frequency of cooperators. The 
dynamics of this model are more complicated because now group selection acts 
against punishers because punishment reduces mean group payoffs. However, 
the correlated effect of group selection on cooperation still tends to increase 
punishment as in the original model. The relative magnitude of these two effects 
depends on the magnitude of the per capita benefit to group members of each 
cooperative act, bln. For reasonable values of b (2c, 4c, and 8c], the results of this 
model are qualitatively similar to those shown. We also investigated a model in 
which cooperation and punishment are characters that vary continuously from 
zero to one. An individual with cooperation value x behaves like a cooperator 
with probability x and like a defector with probability 1 — x. Similarly, an in¬ 
dividual with a punishment value y behaves like a punisher with probability y 
and like a nonpunisher with probability 1 — y. New mutants are uniformly dis¬ 
tributed. The steady-state mean levels of cooperation in this model are similar to 
the base model. Finally, we studied a model without extinction analogous to a 
recent model of selection among stable equilibria because of biased imitation 
(Boyd and Richerson, 2002]. Populations are arranged in a ring, and individuals 
imitate only individuals drawn from the neighboring two groups. Cooperative 
acts produce a per capita benefit bln so that groups with more cooperators have 
higher average payoff, and thus cooperation will, all other things being equal, 
tend to spread because individuals are prone to imitate successful neighbors. We 
could find no reasonable parameter combination that led to significant long-run 
average levels of cooperation in this last model. 


Discussion 

We have shown that although the logic underlying altruistic cooperation and 
altruistic punishment is similar, their evolutionary dynamics are not. In the 
absence of punishment, within-group adaptation acts to decrease the frequency 
of altruistic cooperation, and as a consequence weak drift-like forces are insuf¬ 
ficient to maintain substantial variation between groups. In groups in which 
altruistic punishers are common, defectors are excluded, and this maintains 
variation in the amount of cooperation between groups. Moreover, in such 
groups punishers bear few costs, and punishers decrease only very slowly in 
competition with contributors. As a result, group selection is more effective at 
maintaining altruistic punishment than altruistic cooperation. 

These results suggest that group selection can play an important role in 
the cultural evolution of cooperative behavior and moralistic punishment in 
humans. The importance of group selection is always a quantitative issue. There 
is no doubt that selection among groups acts to favor individually costly, group- 
beneficial behaviors. The question is always, is group selection important under 
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plausible conditions? With parameter values chosen to represent cultural evo¬ 
lution in small-scale societies, cooperation is sustained in groups on the order of 
100 individuals. If the “individuals” in the model represent family groups (on 
grounds that they migrate together and adopt common practices), altruistic pun¬ 
ishment could be sustained in groups of 600 people, a size much larger than typical 
foraging bands and about the size of many ethno-linguistic units in nonagricul- 
tural societies. Group selection is more effective in this model than in standard 
models for two reasons: first, in groups in which defectors are rare, punishers 
suffer only a small payoff disadvantage compared with contributors, and, as a 
result, variation in the frequency of punishers is eroded slowly. Second, payoff- 
biased imitation maintains variation among groups in the frequency of cooper¬ 
ation, because in groups in which punishers are common, defectors achieve a low 
payoff and are unlikely to be imitated. 

It would be possible to construct an otherwise similar genetic model in 
which natural selection played the same role that payoff-biased imitation plays in 
the present model, and there is little doubt that for analogous parameter values 
the results for such a genetic model would be very similar to the results pre¬ 
sented here. However, such a choice of parameters would not be reasonable for a 
genetic model because natural selection is typically much weaker than migration 
for small, neighboring social groups of humans. Our results (figure 13.2) suggest 
that for parameters appropriate for a genetic model, the group selection process 
modeled here will not be effective. It should be noted, however, that the genetic 
evolution of moral emotions might be favored by ordinary natural selection in 
social environments shaped by cultural group selection (Richerson and Boyd, 
1998; Bowles, Choi, and Hopfensitz). 
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Cultural Evolution of Human 
Cooperation 

With Joseph Henrich 


Cooperation 1 is a problem that has long interested evolutionists. In 
both the Origin and Descent of Man, Darwin worried about how his theory might 
handle cases such as the social insects in which individuals sacrificed their 
chances to reproduce by aiding others. Darwin could see that such sacrifices 
would not ordinarily be favored by natural selection. He argued that honeybees 
and humans were similar. Among honeybees, a sterile worker who sacrificed her 
own reproduction for the good of the hive would enjoy a vicarious reproductive 
success through her siblings. Humans, Darwin (1874:178-179) thought, com¬ 
peted tribe against tribe as well as individually, and their “social and moral 
faculties” evolved under the influence of group competition: 

It must not be forgotten that although a high standard of morality 
gives but slight or no advantage to each individual man and his children 
over other men of the tribe, yet that an increase in the number of well- 
endowed men and an advancement in the standard of morality will cer¬ 
tainly give an immense advantage to one tribe over another. A tribe 
including many members who, from possessing in a high degree the spirit 
of patriotism, fidelity, obedience, courage, and sympathy, were always 
ready to aid one another, and to sacrifice themselves for the common 
good, would be victorious over most other tribes; and this would be 
natural selection. 

More than a century has passed since Darwin wrote, but the debate among evo¬ 
lutionary social scientists and biologists is still framed in similar terms—the con¬ 
flict between individual and prosocial behavior guided by selection on individuals 
versus selection on groups. In the meantime social scientists have developed 
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various theories of human social behavior and cooperation—rational choice 
theory takes an individualistic approach while functionalism analyzes the group- 
advantageous aspects of institutions and behavior. However, unlike more tradi¬ 
tional approaches in the social sciences, evolutionary theories seek to explain both 
contemporary behavioral patterns and the origins of the impulses, institutions, 
and preferences that drive behavior. 

In this chapter we refer to “culture” as the information stored in individual 
brains (or in books and analogous media) that was acquired by imitation of, 
or teaching by, others. Because culture can be transmitted forward through time 
from one person to another and because individuals vary in what they learn from 
others, culture has many of the same properties as the genetic system of in¬ 
heritance but also, of course, many differences. The formal import of the anal¬ 
ogies and disanalogies has been worked out in some analytical detail (e.g., 
Cavalli-Sforza and Feldman, 1981; Boyd and Richerson, 1985). We also sub¬ 
scribe to Price’s approach to the concept of group selection. Heritable variation 
between entities can appear at any level of organization, and any level above the 
individual merits the term group selection (Henrich, 2004a; Hamilton, 1975; 
Price, 1972; Sober and Wilson, 1998). Here we focus on the more conventional 
notion that selection on variation between fairly large social units counts as 
group selection. In fact, we have in mind, like Darwin and Hamilton, selection 
among tribes of at least a few hundred people, so we are referring to the cultural 
analog of what is sometimes called interdemic group selection. 


Theories of Cooperation 

We draw evidence about cooperation from many sources. Ethnographic and 
historical sources include diverse religious doctrines, norms, and customs, as well 
as folk psychology. Anthropologists and historians document an immense di¬ 
versity of human social organizations, and most of these are accompanied by 
moral justifications, if often contested ones. Johnson and Earle (2000) provide a 
good introduction to the vast body of data collected by sociocultural anthro¬ 
pologists. Some important empirical topics are the focus of sophisticated work. 
For example, the cross-cultural study of commons management is already a well- 
advanced field (Baland and Platteau, 1996), drawing upon the disciplines of 
anthropology, political science, and economics. 

Human Cooperation Is Extensive and Diverse 

Human patterns of cooperation are characterized by a number of features: 

• Humans are prone to cooperate, even with strangers. Many people co¬ 
operate in anonymous one-shot prisoner’s dilemma games (Marwell 
and Ames, 1981) and often vote altruistically (Sears and Funk, 
1990). People begin contributing substantially to public goods sec¬ 
tors in economic experiments (Ostrom, 1998; Falk, Fehr, and 
Fischbacher, 2002). Experimental results accord with common 
experience. Most of us have traveled in foreign cities, even poor 
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foreign cities filled with strange people for whom our possessions and 
spending money are worth a small fortune, and found risk of robbery 
and commercial chicanery to be small. These observations apply 
across a wide spectrum of societies, from small-scale foragers to 
modern cities in nation states (Henrich, 2004a). 

• Cooperation is contingent on many things. Not everyone cooperates. 
Aid to distressed victims increases substantially if a potential altru¬ 
ist’s empathy is engaged (Batson, 1991). Being able to discuss a game 
beforehand and to make promises to cooperate affects success 
(Dawes, van de Kragt, and Orbell, 1990). The size of the resource, 
technology for exclusion and exploitation of the resource, and 
similar gritty details affect whether cooperation in commons man¬ 
agement arises (Ostrom, 1990:202-204). Scientific findings corre¬ 
spond well to personal experience. Sometimes people cooperate 
enthusiastically, sometimes reluctantly, and sometimes not at all. 
People vary considerably in their willingness to cooperate even un¬ 
der the same environmental conditions. 

• Institutions matter. People from different societies behave differently 
because their beliefs, skills, mental models, values, preferences, and 
habits have been inculcated by long participation in societies with 
different institutions. In repeated play common property experi¬ 
ments, initial defections induce further defections until the contribu¬ 
tion to the public good sector approaches zero. However, if players 
are allowed to exercise strategies they might use in the real world 
(e.g., to punish those who defect), participation in the commons 
stabilizes a substantial degree of cooperation (Fehr and Gachter, 
2002), even in one-shot (nonrepeated) contexts. Strategies for suc¬ 
cessfully managing commons are generally institutionalized in sets of 
rules that have legitimacy in the eyes of the participants (Ostrom, 
1990, ch. 2). Families, local communities, employers, nations, and 
governments all tap our loyalties with rewards and punishments and 
greatly influence our behavior. 

• Institutions are the product of cultural evolution. 2 Richard Nisbett’s 
group has shown how people’s affective and cognitive styles become 
intimately entwined with their social institutions (Cohen and Van- 
dello, 2001; Nisbett and Cohen, 1996; Nisbett, Peng, Choi, and 
Norenzayan, 2001). Because such complex traditions are so deeply 
ingrained, they are slow both to emerge and decay. Many commons 
management institutions have considerable time depths (Ostrom, 
1990, ch. 3). Throughout most of human history, institutional 
change was so slow as to be almost imperceptible by individuals. 
Today, change is rapid enough to be perceptible. The slow rate of 
change of institutions means that different populations experiencing 
the same environment and using the same technology often have 
quite different institutions (Kelly, 1985; Salamon, 1992). 

• Variation in institutions is huge. Already with its very short list of 
societies and games, the experimental ethnography approach has 
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uncovered striking differences (Henrich et al., 2001; Nisbett et al., 
2001). Plausibly, design complexity, coordination equilibria, and other 
phenomena generate multiple evolutionary equilibria and much his¬ 
torical contingency in the evolution of particular institutions (Boyd 
and Richerson, 1992a); consider how different communities, univer¬ 
sities, and countries solve the same problems differently. 

Evolutionary Models Can Explain the Nature of 
Preferences and Institutions 

These facts constrain the theories we can entertain regarding the causes of human 
cooperation. For example, high levels of cooperation are difficult to reconcile 
with the rational choice theorist’s usual assumption of self-regarding preferences, 
and the diversity of institutional solutions to the same environmental problems 
challenges any theory in which institutions arise directly from universal human 
nature. The “second-generation” bounded rational choice theory, championed 
by Ostrom (1998), has begun to address these challenges from within the rational 
choice framework. These approaches add a psychological basis and institutional 
constraints to the standard rational choice theory. Experimental studies verify 
that people do indeed behave quite differently from rational selfish expectations 
(Fehr and Gachter, 2002; Batson, 1991). Although psychological and social 
structures are invoked to explain individual behavior and its variation, an expla¬ 
nation for the origins and variation in psychology and social structure is not part of 
the theory of bounded rationality. 

Evolutionary theory permits us to address the origin of preferences. A num¬ 
ber of economists have noted the neat fit between evolutionary theory and 
economic theory (Hirshleifer, 1977; Becker, 1976). Evolution explains what 
organisms want, and economics explains how they should go about getting what 
they want. Without evolution, preferences are exogenous, to be estimated em¬ 
pirically but not explained. The trouble with orthodox evolutionary theory is that 
its predictions are similar to predictions from selfish rationality, as we will see. At 
the same time, unvarnished evolutionary theory does do a good job of explaining 
most other examples of animal cooperation. To do a satisfactory job of explaining 
why humans have the unusual forms of social behavior depicted in our list of 
stylized facts, we need to appeal to the special properties of cultural evolution and 
more broadly to theories of culture-gene coevolution (Henrich and Boyd, 2001; 
Richerson and Boyd, 1998, 1999; Henrich, 2004a). 

Such evolutionary models have both intellectual and practical payoffs. The 
intellectual payoff is that evolutionary models link answers to contemporary 
puzzles to crucial long timescale processes. The most important economic 
phenomenon of the past 500 years is the rise of capitalist economies and their 
tremendous impact on every aspect of human life. Expanding the timescale a bit, 
the most important phenomena of the last 10 millennia are the evolution of ever¬ 
more complex social systems and ever-more sophisticated technology following 
the origins of agriculture (Richerson, Boyd, and Bettinger, 2001). A satisfac¬ 
tory explanation of both current behavior and its variation must be linked to 
such long-run processes, where the times to reach evolutionary equilibria are 
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measured in millennia or even longer spans of time. More practically, dynamism 
of the contemporary world creates major stresses on institutions that manage 
cooperation. Evolutionary theory will often be useful because it will lead to an 
understanding of how to accelerate institutional evolution to better track rapid 
technological and economic change. Nesse and Williams (1995) provide an 
analogy in the context of medical practice. 

Evolutionary Models Account for the Processes That Shape 
Heritable Genetic and Cultural Variation through Time 

Evolutionary explanations are recursive. Individual behavior results from an inter¬ 
action of inherited attributes and environmental contingencies. In most species, 
genes are the main inherited attributes; however, inherited cultural information is 
also important for humans. Individuals with different inherited attributes may 
develop different behaviors in the same environment. Every generation, evolu¬ 
tionary processes—natural selection is the prototype—impose environmental 
effects on individuals as they live their lives. Cumulated over the whole popu¬ 
lation, these effects change the pool of inherited information, so that the in¬ 
herited attributes of individuals in the next generation differ, usually subtly, from 
the attributes in the previous generation. Over evolutionary time, a lineage cycles 
through the recursive pattern of causal processes once per generation, more or 
less gradually shaping the gene pool and thus the succession of individuals that 
draw samples of genes from it. Statistics that describe the pool of inherited at¬ 
tributes (e.g., gene frequencies) are basic state variables of evolutionary analysis. 
They are what change over time. 

Note that in a recursive model, we explain individual behavior and population- 
level processes in the same model. Individual behavior depends, in any given 
generation, on the gene pool from which inherited attributes are sampled. The 
pool of inherited attributes depends in turn upon what happens to a population 
of individuals as they express those attributes. Evolutionary biologists have a long 
list of processes that change the gene frequencies, including natural selection, 
mutation, and genetic drift. However, no organism experiences natural selection. 
Organisms either live or die, reproduce or fail to reproduce, for concrete rea¬ 
sons particular to the local environment and the organism’s own particular at¬ 
tributes. If, in a particular environment, some types of individuals do better than 
others, and if this variation has a heritable basis, then we label as “natural se¬ 
lection” the resulting changes in gene frequencies of populations. We use abstract 
categories like selection to describe such concrete events because we wish to build 
up some useful generalizations about evolutionary process. Few would argue that 
evolutionary biology is the poorer for investing effort in this generalizing project. 

Although some of the processes that lead to cultural change are very dif¬ 
ferent from those that lead to genetic change, the logic of the two evolutionary 
problems is very similar. For example, the cultural generation time is short in the 
case of ideas that spread rapidly, but modeling the evolution of such cultural 
phenomena (e.g., semiconductor technology) presents no special problems (Boyd 
and Richerson, 1985:68-69). Similarly, human choices include ones that modify 
inherited attributes directly, rather than indirectly, by natural selection. These 
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“Lamarckian” effects are easily added to models, and the models remain evo¬ 
lutionary so long as rationality remains bounded (Young, 1998). Such models 
easily handle continuous (nondiscrete) traits, low-fidelity transmission, and any 
number of “inferential transformations” that might occur during transmission 
(Henrich and Boyd, 2002; Cavalli-Sforza and Feldman, 1981; Boyd and Ri- 
cherson, 1985). The degenerate case of omniscient rationality, of course, needs 
no recursion because everything happens in the first generation (instantly in a 
typical rational choice model). The study of how genetically and culturally in¬ 
herited elements impose bounds on choice is a natural extension of the concept 
of bounded rationality (Boyd and Richerson, 1993). 


Evolution Is Multilevel 

Evolutionary theory is always multilevel; at a minimum, it keeps track of prop¬ 
erties of individuals, like their genotypes, and of the population, such as the 
frequency of a particular gene. Other levels also may be important. Individual’s 
phenotypes are derived from many genes interacting with each other and the 
environment. Populations may be structured (e.g., divided into social groups 
with limited exchanges of members). Thus, evolutionary theories are systemic, 
integrating every part of biology. In principle, everything that goes into causing 
change through time plays its proper part in the theory. 

This in-principle completeness led Ernst Mayr (1982) to speak of “proxi¬ 
mate” and “ultimate” causes in biology. Proximate causes are those that phys¬ 
iologists and biochemists generally treat by asking how an organism functions. 
These are the causes produced by individuals with attributes interacting with 
environments and producing effects upon them. Do humans use innate coop¬ 
erative propensities to solve commons problems, or do they have only self- 
interested innate motives? Or are the causes more complex than either proposal? 
Ultimate causes are evolutionary. The ultimate cause of an organism’s behavior 
is the history of evolution that shaped the gene pool from which our samples of 
innate attributes are drawn. Evolutionary analyses answer why questions. Why 
do human communities typically solve at least some of the commons dilemmas 
and other cooperation problems on a scale unknown in other apes and monkeys? 
Human-reared chimpanzees are capable of many human behaviors, but they nev¬ 
ertheless retain many chimpanzee behaviors and cannot act as full members of 
a human community (Savage-Rumbaugh and Lewin, 1994; Gardner, Gardner, 
and Van Cantfort, 1989). Thus, we know that humans have different innate 
influences on their behavior than chimpanzees do, and these must have arisen in 
the course of the two species’ divergence from our common ancestor. 

In Darwinian evolutionary theories, the ultimate sources of cooperative 
behavior are classically categorized into three evolutionary processes operating at 
different levels of organization (for a framework unifying these classical divi¬ 
sions, see Henrich, 2004a): 

• Individual-level selection. Individuals and the variants they carry are 
obviously a locus of selection. Selection at this level favors selfish 
individuals who are evolved to maximize their own survival and 
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reproductive success. Pairs of self-interested actors can cooperate 
when they interact repeatedly (Axelrod and Hamilton, 1981; Trivers, 
1971). Alexander (1987) argued that such reciprocal cooperation can 
also explain complex human social systems, but most formal modeling 
studies make this proposal doubtful (Leimar and Hammerstein, 2001; 
Boyd and Richerson, 1989). Still, some version of Alexander’s indirect 
reciprocity is perhaps the most plausible alternative to the cultural 
group selection hypothesis that we champion here. Most such pro¬ 
posals beg the question of how humans and not other animals can take 
massive advantage of indirect reciprocity (e.g., Nowak and Sigmund, 
1998). Smith (2003) proposes to make language the key. 3 

• Kin selection. Hamilton’s (1964) articles showing that kin should 
cooperate to the extent that they share genes identical by common 
descent are one of the theoretical foundations of sociobiology. Kin 
selection can lead to cooperative social systems of a remarkable scale, 
as illustrated by the colonies of termites, ants, and some bees and 
wasps. However, most animal societies are small because individuals 
have few close relatives. It is the fecundity of insects, and in one case 
rodents, that permits a single queen to produce huge numbers of 
sterile workers and hence large, complex societies composed of close 
relatives (Campbell, 1983). 

• Group selection. Selection can act on any pattern of heritable variation 
that exists (Price, 1972). Darwin’s model of the evolution of coop¬ 
eration by intertribal competition is perfectly plausible, as far as it 
goes. The problem is that genetic variation between groups other 
than kin groups is hard to maintain unless the migration between 
groups is very small or unless some very powerful force generates 
between-group variation (e.g., Aoki, 1982; Slatkin and Wade, 1978; 
Wilson, 1983). In the case of altruistic traits, selection will tend to 
favor selfish individuals in all groups, tending to aid migration in re¬ 
ducing variation between groups. Success of kin selection in ac¬ 
counting for the most conspicuous and highly organized animal 
societies (except humans) has convinced many, but not all, evolu¬ 
tionary biologists that group selection is of modest importance in 
nature (for a group selectionist's view of the controversy, see Sober 
and Wilson, 1998). It is also important to note that the problem of 
maintenance of between-group variation applies only to altruistic/ 
cooperative traits, not to social behavior in general. Nearly all evo¬ 
lutionary biologists would agree that group selection is likely to be 
important for any social interaction with multiple stable equilibria, 
such as those coordination situations mentioned by Smith (2003). 

We could make this picture much more complex by adding higher and lower 
levels of structure. Many examples from human societies will occur to the 
reader, such as gender. Indeed, Rice (1996) has elegantly demonstrated that 
selection on genes expressed in the different sexes sets up a profound conflict of 
interest between these genes. If female Drosophila are prevented from evolving 
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defenses, male genes will evolve that seriously degrade female fitness. The ge¬ 
nome is full of such conflicts, usually muted by the fact that an individual's genes 
are forced by the evolved biology of complex organisms to all have an equal shot 
at being represented in one’s offspring. Our own bodies are a group-selected 
community of genes organized by elaborate “institutions” to ensure fairness in 
genetic transmission, such as the lottery of meiosis that gives each chromosome 
of a pair a fair chance at entering the functional gamete (Maynard Smith and 
Szathmary, 1995). 


Culture Evolves 

In theorizing about human evolution, we must include processes affecting culture 
in our list of evolutionary processes alongside those that affect genes. Culture is a 
system of inheritance. We acquire behavior by imitating other individuals much 
as we get our genes from our parents. A fancy capacity for high-fidelity imitation 
is one of the most important derived characters distinguishing us from our pri¬ 
mate relatives (Tomasello, 1999). We are also an unusually docile animal (Si¬ 
mon, 1990) and unusually sensitive to expressions of approval and disapproval 
by parents and others (Baum, 1994). Thus, parents, teachers, and peers can 
rapidly, easily, and accurately shape our behavior compared to training other 
animals using more expensive material rewards and punishments. Finally, once 
children acquire language, parents and others can communicate new ideas quite 
economically. Our own contribution to the study of human behavior is a series 
of mathematical models of what we take to be the fundamental processes of 
cultural evolution (e.g., Boyd and Richerson, 1985). Application of Darwinian 
methods to the study of cultural evolution was forcefully advocated by Campbell 
(1965, 1975). Cavalli-Sforza and Feldman (1981) constructed the first mathe¬ 
matical models to analyze cultural recursions. The list of processes that shape 
cultural change includes: 

• Biases. Humans do not passively imitate whatever they observe. 
Rather, cultural transmission is biased by decision rules that in¬ 
dividuals apply to the variants they observe or try out. The rules 
behind such selective imitation may be innate or the result of earlier 
imitation or a mixture of both. Many types of rules might be used to 
bias imitation. Individuals may try out a behavior and let reinforce¬ 
ment guide acceptance or rejection, or they may use various rules of 
thumb to reduce the need for costly trials and punishing errors. Rules 
like “copy the successful,” “copy the prestigious” (Henrich and Gil- 
White, 2001; Boyd and Richerson, 1985), or "copy the majority” 
(Boyd and Richerson, 1985; Henrich and Boyd, 1998) allow in¬ 
dividuals to acquire rapidly and efficiently adaptive behavior across a 
wide range of circumstances and play an important role in our hy¬ 
pothesis about the origins of cooperative tendencies in human be¬ 
havior (Henrich and Boyd, 2001). 

• Nonrandom variation. Genetic innovations (mutations, recombina¬ 
tions) are random with respect to what is adaptive. Human individual 
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innovation is guided by many of the same rules that are applied to 
biasing ready-made cultural alternatives. Bias and learning rules have 
the effect of increasing the rate of evolution relative to what can be 
accomplished by random mutation, recombination, and natural se¬ 
lection. We believe that culture originated in the human lineage as an 
adaptation to the Plio-Pleistocene ice-age climate deterioration, 
which includes much rapid, high-amplitude variation of just the sort 
that would favor adaptation by nonrandom innovation and biased 
imitation (Richerson and Boyd, 2000a, b). 

• Natural selection. Since selection operates on any form of heritable 
variation and imitation and teaching are forms of inheritance, natural 
selection will influence cultural as well as genetic evolution. However, 
selection on culture is liable to favor different behaviors than selection 
on genes. Because we often imitate peers, culture is liable to selection 
at the subindividual level, potentially favoring pathogenic cultural 
variants—selfish memes (Blackmore, 1999). On the other hand, rules 
like conformist imitation have the opposite effect. By tending to sup¬ 
press cultural variation within groups, such rules protect variation 
between them, potentially exposing our cultural variation to much 
stronger group selection effects than our genetic variation (Soltis, 
Boyd, and Richerson, 1995; Henrich and Boyd, 1998). Human pat¬ 
terns of cooperation may owe much to cultural group selection. 

Evolutionary Models Are Consistent with a Wide Variety of Theories 

Evolutionary theory prescribes a method, not an answer, and a wide range of 
particular hypotheses can be cast in an evolutionary framework. If population- 
level processes are important, we can set up a system for keeping track of her¬ 
itable variation and the processes that change it through time. Darwinism as a 
method is not at all committed to any particular picture of how evolution works 
or what it produces. Any sentence that starts with “evolutionary theory pre¬ 
dicts’’ should be regarded with caution. 

Evolutionary social science is a diverse field (Borgerhoff Mulder, Richerson, 
Thornhill, and Voland, 1997; Laland and Brown, 2002). Our own work, which 
emphasizes an ultimate role for culture and for group selection on cultural var¬ 
iation, is controversial. Many evolutionary social scientists assume that culture is 
a strictly proximate phenomenon, akin to individual learning (e.g., Alexander, 
1979), or is so strongly constrained by evolved psychology as to be virtually 
proximate (Wilson, 1998). As Alexander (1979:80) puts it, “Cultural novelties 
do not replicate or spread themselves, even indirectly. They are replicated as a 
consequence of the behavior of vehicles of gene replication.” We think both 
theory and evidence suggest that this perspective is dead wrong. Theoretical 
models show that the processes of cultural evolution can behave differently in 
critical respects from those only including genes, and much evidence is consis¬ 
tent with these models. 

Most evolutionary biologists believe that individually costly group-beneficial 
behavior can arise only as a side effect of individual fitness maximization. We 
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have noted the problems with maintaining variation between groups in theory 
and the seeming success of alternative explanations. Many, but by no means all, 
students of evolution and human behavior have followed the argument against 
group selection forcefully articulated by Williams (1966). 4 

However, cultural variation is more plausibly susceptible to group selection 
than is genetic variation. For example, if people use a somewhat conformist bias in 
acquiring important social behaviors, variation between groups needed for group 
selection to operate is protected from the variance-reducing force of migration 
between groups (Boyd and Richerson, 1985, 2002; Henrich and Boyd, 2001). 


Evolution of Cooperative Institutions 

Here we summarize our theory of institutional evolution, developed elsewhere 
in more detail (Richerson and Boyd, 1998, 1999), which is rooted in a mathe¬ 
matical analysis of the processes of cultural evolution and is consistent with 
much empirical data. We make limited claims for this particular hypothesis, 
although we think that the thrust of the empirical data as summarized by the 
stylized facts are much harder on current alternatives. We make a much stronger 
claim that a dual gene-culture theory of some kind will be necessary to account 
for the evolution of human cooperative institutions. 

Understanding the evolution of contemporary human cooperation requires 
attention to two different timescales: first, a long period of evolution in the 
Pleistocene epoch shaped the innate "social instincts” that underpin modern 
human behavior. During this period, much genetic change occurred as a result 
of humans living in groups with social institutions heavily influenced by culture, 
including cultural group selection (Richerson and Boyd, 2001). On this time- 
scale, genes and culture coevolve, and cultural evolution is plausibly a leading 
rather than lagging partner in this process. We sometimes refer to the process 
as “culture-gene coevolution.” Then, only about 10,0000 years ago, the origins 
of agricultural subsistence systems laid the economic basis for revolutionary 
changes in the scale of social systems. Evidence suggests that genetic changes 
in the social instincts over the last 10,000 years are insignificant. Evolution of 
complex societies, however, has involved the relatively slow cultural accumu¬ 
lation of institutional “work-arounds” that take advantage of a psychology 
evolved to cooperate with distantly related and unrelated individuals belonging 
to the same symbolically marked “tribe” while coping more or less successfully 
with the fact that these social systems are larger, more anonymous, and more 
hierarchical than the late Pleistocene tribal-scale systems. 5 

Tribal Social Instincts Hypothesis 

Our hypothesis is premised on the idea that selection between groups plays a 
much more important role in shaping culturally transmitted variation than it 
does in shaping genetic variation. As a result, humans have lived in social en¬ 
vironments characterized by high levels of cooperation for as long as culture has 
played an important role in human development. To judge from the other living 
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apes, our remote ancestors had only rudimentary culture (Tomasello, 1999} and 
lacked cooperation on a scale larger than groups of close kin (Boehm, 1999}. The 
difficulty of constructing theoretical models of group selection on genes favoring 
cooperation matches neatly with the empirical evidence that cooperation in 
most social animals is limited to kin groups. In contrast, rapid cultural adaptation 
can lead to ample variation among groups whenever multiple stable social equi¬ 
libria arise. At least two cultural processes can maintain multiple stable equi¬ 
libria: (1} conformist social learning and (2} moralistic enforcement of norms. 
Such models of group selection are relatively powerful because they require only 
the social, not physical, extinction of groups. Formal theoretical models suggest 
that conformism is an adaptive heuristic for biasing imitation under a wide 
variety of conditions (Boyd and Richerson, 1985, ch. 7; Henrich and Boyd, 1998; 
Simon, 1990}, and both field and laboratory work provide empirical support 
(Henrich, 2001}. Models of moralistic punishment (Boyd and Richerson, 1992b; 
Boyd, Gintis, Bowles, and Richerson, 2003; Henrich and Boyd, 2001} lead to 
multiple stable social equilibria and to reductions in noncooperative strategies if 
punishment is prosocial. As a consequence, we believe, a growing reliance on 
cultural evolution led to larger, more cooperative societies among humans over 
the last 250,000 years or so. 

Ethnographic evidence suggests that small-scale human societies are subject 
to group selection of the sort needed to favor cooperation at a tribal scale. Soltis 
et al. (1995} analyzed ethnographic data on the results of violent conflicts among 
Highland New Guinea clans. These conflicts fairly frequently resulted in the 
social extinction of clans. Many of the details of this process are consistent with 
cultural group selection. For example, social extinction does not mean physical 
elimination of the entire group. Quite the contrary, most people survive defeat 
but flee as refugees to other groups, into which they are incorporated. This sort 
of extinction cannot support genetic group selection because so many of the 
defeated survive and because they would tend to carry their unsuccessful genes 
into successful groups, rapidly running down variation between groups. How¬ 
ever, the effects of conformist cultural transmission combined with moralistic 
punishment makes between-group cultural variation much less subject to ero¬ 
sion by migration and within-group success of uncooperative strategies than is 
true in the case of acultural organisms. 

The New Guinea cases had little information regarding the cultural variants 
that might have been favored by cultural group selection. Other examples are 
more informative in this regard. Kelly (1985} has worked out in detail the way 
bridewealth customs in the Nuer and Dinka, cattle-keeping people of the 
Southern Sudan, led to the Nuer maintaining larger tribal systems. These larger 
tribes, in turn, allowed the Nuer to field larger forces than Dinka in disputes 
between the two groups. As a result, the Nuer expanded rapidly at the expense 
of the Dinka in the nineteenth and early twentieth centuries. Here, as in New 
Guinea, many Dinka lineages survived these fights and were often assimilated 
into Nuer tribes, a process, again, highly hostile to group selection on genes. The 
larger ethnographic corpus suggests that the sort of intergroup conflict described 
by Soltis and Kelly is very common, if not ubiquitous (Keeley, 1996; Otterbein, 
1970}. Darwin’s picture of a group selection process operating at the level of 
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competing, symbolically marked tribal units with the outcome determined 
by differences in “patriotism, fidelity, obedience, courage, sympathy” (Darwin, 
1874, ch. 5) and the like can work, but only upon cultural—not genetic— 
variation for such traits. 

Consistent with this argument, evidence suggests that people in late Pleis¬ 
tocene human societies cooperated on a tribal scale (Bettinger, 1991:203-205; 
Richerson and Boyd, 1998). “Tribe” is sometimes used in a technical sense to 
include only societies with fairly elaborate institutions for organizing cooperation 
among distantly related and unrelated people. We apply the term to any insti¬ 
tution that organizes interfamilial cooperation, even if it is rather simple and the 
amount of cooperation organized modest. Definitional issues aside, our claim is 
controversial because the archaeological record permits only weak inferences 
about social organization and because the spectrum of social organization in 
ethnographically known hunter-gatherers is very broad (Kelly, 1995). At the 
simple end of the spectrum are “family-level” societies (Johnson and Earle, 
2000; Steward, 1955), such as the Shoshone of the Great Basin and IKung of the 
Kalahari. Because these two groups are so simply organized, some scholars used 
them as an archetypal model for Paleolithic societies (Kelly, 1995:2). However, 
such groups are likely poor examples of the “average” Paleolithic society be¬ 
cause they inhabit and have adapted to marginal environments using subsistence 
strategies quite different from any known from the Paleolithic (R. Bettinger, 
personal communication). Also, we believe that the ethnographic societies used 
to exemplify the family level of organization actually have tribal institutions of 
some sophistication. 

Much evidence suggests that typical Paleolithic societies were more com¬ 
plex than the Shoshone or the IKung. Many late Pleistocene societies empha¬ 
sized big game hunting, often in resource-rich environments, rather than the 
plant foods emphasized in the marginal environments inhabited by Kalahari 
foragers and the Shoshone. For example, the Kalahari foragers (along with the 
Aranda in the Australian desert) anchor the low end of the distribution with 
respect to plant biomass found in regions of 23 ethnographically known nomadic 
foraging groups (Kelly, 1995:122). As Steward (1955) reports, big game hunting 
in ethnographic cases typically involves cooperation on a larger scale than plant 
collecting and small game hunting; thus, we should expect societies in the late 
Pleistocene to be more, not less, socially complex than the IKung and Shoshone. 
In any case we think it an error to try to identify an archetypal Pleistocene so¬ 
ciety; most likely last glacial societies spanned as large or larger a spectrum of 
social organization as ethnographically known cases. Art and settlement size 
(several hundred people) at Upper Paleolithic sites in France and Spain suggest 
that these societies were toward the complex end of the foraging spectrum (Price 
and Brown, 1985). In Central Europe, the palisades and large housing structures 
look much more like those of the Northwest Coast Indians or big-men social 
forms of New Guinea than those of the IKung or Shoshone (Johnson and Earle, 
2000 ). 

Moreover, despite the marginality of their environment, the archetypal 
family-level societies do have tribal-scale institutions for dealing with environ¬ 
mental uncertainty (Wiessner, 1984). For example, the Shoshonean peoples of 
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the North American Great Basin foraged for most of the year in nuclear family 
units. Resources in the basin were not only sparse but widely scattered, mili¬ 
tating against aggregation into larger units during much of the year. Although 
such bands were generally politically autonomous, they were at least tenuously 
linked into larger units. In regard to the Shoshoneans, Steward (1955:109] re¬ 
marks that the “nuclear families have always co-operated with other families in 
various ways. Since this is so, the Shoshoneans, like other fragmented family 
groups, represent the family level of sociocultural integration only in a relative 
sense.” Winter encampments of 20 or 30 families were the largest aggregations 
among Shoshoneans; however, these were not formal organizations but rather 
aggregations of convenience. Aside from visiting, some cooperative ventures, 
such as dances (fandangos], rabbit drives, and occasional antelope drives, were 
organized during winter encampments. The number of families that a given 
family might camp with over a period of years was also not fixed, although peo¬ 
ple preferred to camp with people speaking the same dialect (R. Bettinger, 
personal communication]. Steward’s picture of the simplicity of the Shoshone 
has been challenged. Thomas, Pendleton, and Cappannari (1986:278] observe 
that, at best, Steward’s characterization applied only to limiting cases, as, indeed, 
his frank use of them to imperfectly exemplify an ideal type suggests. Murphy 
and Murphy (1986], citing the case of the Northern Shoshone and Bannock, 
argue that the unstructured fluidity of Shoshonean society conceals a sophisti¬ 
cated adaptation to the sparse and uncertain resources of the Great Basin. The 
Shoshoneans maintained peace among themselves over a very large region, en¬ 
abling families and small groups of families to move over vast distances in re¬ 
sponse to local feast and famine. When local resources permitted and necessity 
required, they were able to assemble considerable numbers of people for col¬ 
lective purposes. Murphy and Murphy cite the formation of war parties num¬ 
bering in the hundreds to contest bison hunting areas with the Blackfeet. Indeed, 
the Shoshone and their relatives were relatively recent immigrants to the Great 
Basin who pushed out societies that were probably socially more complex 
but less well adapted to the sparse Great Basin environment (Bettinger and 
Baumhoff, 1982]. Murphy and Murphy summarize by saying “the Shoshone are 
a ‘people’ in the truest sense of the word” (p. 92], Compared to our great ape 
relatives, and presumably our remoter ancestors, Shoshonean families main¬ 
tained generally friendly relations with a rather large group of other families, 
could readily strike up cooperative relations with strangers of their ethnic group, 
and organized cooperative activities on a considerable scale. 

We believe that the human capacity to live in larger-scale forms of tribal 
social organization evolved through a coevolutionary ratchet generated by the 
interaction of genes and culture. Rudimentary cooperative institutions favored 
genotypes that were better able to live in more cooperative groups. Those in¬ 
dividuals best able to avoid punishment and acquire the locally relevant norms 
were more likely to survive. At first, such populations would have been only 
slightly more cooperative than typical nonhuman primates. However, genetic 
changes, leading to moral emotions, like shame and a capacity to learn and in¬ 
ternalize local practices, would allow the cultural evolution of more sophisticated 
institutions that in turn enlarged the scale of cooperation. These successive 
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rounds of coevolutionary change continued until eventually people were equip¬ 
ped with capacities for cooperation with distantly related people, emotional at¬ 
tachments to symbolically marked groups, and a willingness to punish others for 
transgression of group rules. Mechanisms by which cultural institutions might 
exert forces tugging in this direction are not far to seek. People are likely to 
discriminate against genotypes that are incapable of conforming to cultural norms 
(Richerson and Boyd, 1989; Laland, Kumm, and Feldman, 1995). People who 
cannot control their self-serving aggression ended up exiled or executed in small- 
scale societies and imprisoned in contemporary ones. People whose social skills 
embarrass their families will have a hard time attracting mates. Of course, selfish 
and nepotistic impulses were never entirely suppressed; our genetically trans¬ 
mitted evolved psychology shapes human cultures, and, as a result, cultural ad¬ 
aptations often still serve the ancient imperatives of inclusive genetic fitness. 
However, cultural evolution also creates new selective environments that build 
cultural imperatives into our genes. 

Paleoanthropologists believe that human cultures were essentially modern 
by the Upper Paleolithic, 50,000 years ago (Klein, 1999, ch. 7), if not much 
earlier (McBrearty and Brooks, 2000). Thus, even if the cultural group selection 
process began as late as the Upper Paleolithic, such social selection could easily 
have had extensive effects on the evolution of human genes through this process. 
More likely, Upper Paleolithic societies were the culmination of a long period of 
coevolutionary increases in a tendency toward tribal social life. 6 

We suppose that the resulting "tribal instincts” are something like princi¬ 
ples in the Chomskian linguists’ “principles and parameters” view of language 
(Pinker, 1994). Innate principles furnish people with basic predispositions, 
emotional capacities, and social dispositions that are implemented in practice 
through highly variable cultural institutions, the parameters. People are innately 
prepared to act as members of tribes, but culture tells us how to recognize who 
belongs to our tribes; what schedules of aid, praise, and punishment are due to 
tribal fellows; and how the tribe is to deal with other tribes: allies, enemies, and 
clients. The division of labor between innate and culturally acquired elements is 
poorly understood, and theory gives little guidance about the nature of the 
synergies and trade-offs that must regulate the evolution of our psychology 
(Richerson and Boyd, 2000a). The fact that human-reared apes cannot be so¬ 
cialized to behave like humans guarantees that some elements are innate. Con- 
trarily, the diversity and sometimes rapid change of social institutions guarantee 
that much of our social life is governed by culturally transmitted rules, skills, and 
even emotions. We beg the reader’s indulgence for the necessarily brief and as¬ 
sertive nature of our argument here. The rationale and ethnographic support for 
the tribal instincts hypothesis are laid out in more detail in Richerson and Boyd 
(1998, 1999); for a review of the broad spectrum of empirical evidence sup¬ 
porting the hypothesis, see Richerson and Boyd (2001). 

Work-around Hypothesis 

Contemporary human societies differ drastically from the societies in which our 
social instincts evolved. Pleistocene hunter-gatherer societies were comparatively 
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small, egalitarian, and lacking in powerful institutionalized leadership. By con¬ 
trast, modern societies are large, inegalitarian, and have coercive leadership in¬ 
stitutions (Boehm, 1993). If the social instincts hypothesis is correct, our innate 
social psychology furnishes the building blocks for the evolution of complex 
social systems, while simultaneously constraining the shape of these systems 
(Salter, 1995). To evolve large-scale, complex social systems, cultural evolu¬ 
tionary processes, driven by cultural group selection, take advantage of whatever 
support these instincts offer. For example, families willingly take on the essential 
roles of biological reproduction and primary socialization, reflecting the ancient 
and still powerful effects of selection at the individual and kin level. At the same 
time, cultural evolution must cope with a psychology evolved for life in quite dif¬ 
ferent sorts of societies. Appropriate larger-scale institutions must regulate the 
constant pressure from smaller groups (coalitions, cabals, cliques) to subvert 
rules favoring large groups. To do this, cultural evolution often makes use of 
“work-arounds.” It mobilizes the tribal instincts for new purposes. For example, 
large national and international (e.g., great religions) institutions develop 
ideologies of symbolically marked inclusion that often fairly successfully engage 
the tribal instincts on a much larger scale. Military and religious organizations 
(e.g., Catholic Church), for example, dress recruits in identical clothing (and 
haircuts) loaded with symbolic markings and then subdivide them into small 
groups with whom they eat and engage in long-term repeated interaction. Such 
work-arounds are often awkward compromises, as is illustrated by the existence 
of contemporary societies handicapped by narrow, destructive loyalties to small 
tribes (West, 1941) and even to families (Banfield, 1958). In military and reli¬ 
gious organizations, excessive within-group loyalty often subverts higher-level 
goals. If this picture of the innate constraints on current institutional evolution is 
correct, it is evidence for the existence of tribal social instincts that buttress the 
uncertain inferences from ethnography and archaeology about late Pleistocene 
societies. Complex societies are, in effect, grand natural social-psychological 
experiments that stringently test the limits of our innate dispositions to coop¬ 
erate. We expect the social institutions of complex societies to simulate life in 
tribal-scale societies in order to generate cooperative “lift.” We also expect that 
complex institutions will accept design compromises to achieve such “lift,” 
which would be unnecessary if innate constraints of a specifically tribal structure 
were absent. 


Coercive Dominance 

The cynics’ favorite mechanism for creating complex societies is command 
backed up by force. The conflict model of state formation has this character 
(Carneiro, 1970), as does Hardin’s (1968) recipe for commons management. 

Elements of coercive dominance are no doubt necessary to make complex 
societies work. Tribally legitimated self-help violence is a limited and expensive 
means of altruistic coercion. Complex human societies have to supplement 
the moralistic solidarity of tribal societies with formal police institutions. Oth¬ 
erwise, the large-scale benefits of cooperation, coordination, and division of labor 
would cease to exist in the face of selfish temptations to expropriate them by 
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individuals, nepotists, cabals of reciprocators, organized predatory bands, greedy 
capitalists, and classes or castes with special access to means of coercion. At the 
same time, the need for organized coercion as an ultimate sanction creates roles, 
classes, and subcultures with the power to turn coercion to narrow advantage. 
Social institutions of some sort must police the police so that they will act in 
the larger interest to a measurable degree. Indeed, Boehm (1993) notes that the 
egalitarian social structure of simple societies is itself an institutional achieve¬ 
ment by which the tendency of some to try to dominate others on the typical 
primate pattern is frustrated by the ability of the individuals who would be 
dominated to collaborate to enforce rules against dominant behavior. Such po¬ 
licing is never perfect and, in the worst cases, can be very poor. The fact that 
leadership in complex systems always leads to at least some economic inequality 
suggests that narrow interests, rooted in individual selfishness, kinship, and, 
often, the tribal solidarity of the elite, always exert an influence. The use of 
coercion in complex societies offers excellent examples of the imperfections in 
social arrangements traceable to the ultimately irresolvable tension of more 
narrowly selfish and more inclusively altruistic instincts. 

While coercive, exploitative elites are common enough, we suspect that no 
complex society can be based purely on coercion for two reasons: (1) coercion of 
any great mass of subordinates requires that the elite class or caste be itself a 
complex, cooperative venture; (2) defeated and exploited peoples seldom accept 
subjugation as a permanent state of affairs without costly protest. Deep feelings of 
injustice generated by manifestly inequitable social arrangements move people to 
desperate acts, driving the cost of dominance to levels that cripple societies in the 
short run and often cannot be sustained in the long run (Insko et ah, 1983; 
Kennedy, 1987). Durable conquests, such as those leading to the modern Euro¬ 
pean national states, Han China, or the Roman Empire, leaven raw coercion with 
other institutions. The Confucian system in China and the Roman legal system in 
the West were far more sophisticated institutions than the highly coercive sys¬ 
tems sometimes set up by predatory conquerors and even domestic elites. 


Segmentary Hierarchy 

Late Pleistocene societies were undoubtedly segmentary in the sense that supra- 
band ethnolinguistic units served social functions. The segmentary principle can 
serve the need for more command and control by hardening lines of authority 
without disrupting the face-to-face nature of proximal leadership in egalitarian 
societies. The Polynesian ranked lineage system illustrates how making political 
offices formally hereditary according to a kinship formula can help deepen and 
strengthen a command and control hierarchy (Kirch, 1984). A common method 
of deepening and strengthening the hierarchy of command and control in com¬ 
plex societies is to construct a nested hierarchy of offices, using various mixtures 
of ascription and achievement principles to staff the offices. Each level of the 
hierarchy replicates the structure of a hunting and gathering band. A leader at 
any level interacts mainly with a few near-equals at the next level down in the 
system. New leaders are usually recruited from the ranks of subleaders, often 
tapping informal leaders at that level. As Eibl-Eibesfeldt (1989) remarks, even 
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high-ranking leaders in modern hierarchies adopt much of the humble head¬ 
man’s deferential approach to leadership. Henrich and Gil-White’s (2001) work 
on prestige provides a coevolutionary explanation for this phenomenon. 

The hierarchical nesting of social units in complex societies gives rise to 
appreciable inefficiencies (Miller, 1992). In practice, brutal sheriffs, incompetent 
lords, venal priests, and their ilk degrade the effectiveness of social organizations 
in complex societies. Squires (1986) dissects the problems and potentials of 
modern hierarchical bureaucracies to perform consistently with leaders’ inten¬ 
tions. Leaders in complex societies must convey orders downward, not just seek 
consensus among their comrades. Devolving substantial leadership responsibility 
to subleaders far down the chain of command is necessary to create small-scale 
leaders with face-to-face legitimacy. However, it potentially generates great fric¬ 
tion if lower-level leaders either come to have different objectives than the upper 
leadership or are seen by followers as equally helpless pawns of remote leaders. 
Stratification often creates rigid boundaries so that natural leaders are denied 
promotion above a certain level, resulting in inefficient use of human resources 
and a fertile source of resentment to fuel social discontent. 

On the other hand, failure to articulate properly tribal-scale units with more 
inclusive institutions is often highly pathological. Tribal societies often must 
live with chronic insecurity due to intertribal conflicts. One of us once attended 
the Palio, a horse race in Siena in which each ward, or contrada, in this small 
Tuscan city sponsors a horse. Voluntary contributions necessary to pay the rider, 
finance the necessary bribes, and host the victory party amount to a half a million 
dollars. The contrada clearly evoke the tribal social instincts: they each have a 
totem—the dragon, the giraffe—special colors, rituals, and so on. The race excites 
a tremendous, passionate rivalry. One can easily imagine medieval Siena in 
which swords clanged and wardmen died, just as they do or did in warfare be¬ 
tween New Guinea tribes (Rumsey, 1999), Greek city-states (Runciman, 1998), 
inner-city street gangs (Jankowski, 1991), and ethnic militias. 


Exploitation of Symbolic Systems 

The high population density, division of labor, and improved communication 
made possible by the innovations of complex societies increased the scope for 
elaborating symbolic systems. The development of monumental architecture to 
serve mass ritual performances is one of the oldest archaeological markers of 
emerging complexity. Usually an established church or less formal ideological 
umbrella supports a complex society’s institutions. At the same time, complex 
societies exploit the symbolic ingroup instinct to delimit a quite diverse array of 
culturally defined subgroups, within which a good deal of cooperation is rou¬ 
tinely achieved. Ethnic group-like sentiments in military organizations are often 
most strongly reinforced at the level of 1,000-10,000 or so men (British and 
German regiments, U.S. divisions; Kellett, 1982). Typical civilian symbolically 
marked units include nations, regions (e.g., Swiss cantons), organized tribal 
elements (Garthwaite, 1993), ethnic diasporas (Curtin, 1984), castes (Srinivas, 
1962; Gadgil and Guha, 1992), large economic enterprises (Fukuyama, 1995), 
and civic organizations (Putnam, Leonardi, and Nanetti, 1993). 
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How units as large as modern nations tap into the tribal social instincts is 
an interesting issue. Anderson (1991) argues that literate communities, and the 
social organizations revolving around them (e.g., Latin literates and the Catholic 
Church), create “imagined communities,” which in turn elicit significant com¬ 
mitment from members of the community. Since tribal societies were often large 
enough that some members were not known personally to any given person, 
common membership would sometimes have to be established by the mutual 
discovery of shared cultural understandings, as simple as the discovery of a 
shared language in the case of the Shoshone. The advent of mass literacy and 
print media—Anderson stresses newspapers—made it possible for all speakers 
of a given vernacular to have confidence that all readers of the same or related 
newspapers share many cultural understandings, especially when organizational 
structures such as colonial government or business activities really did give 
speakers some institutions in common. Nationalist ideologists quickly discovered 
the utility of newspapers for building imagined communities, typically several 
contending variants of the community, making nations the dominant quasi-tribal 
institution in most of the modern world. 

Many problems and conflicts revolve around symbolically marked groups in 
complex societies. Official dogmas often stultify desirable innovations and lead to 
bitter conflicts with heretics. Marked subgroups often have enough tribal cohe¬ 
sion to organize at the expense of the larger social system. The frequent seizure of 
power by the military in states with weak institutions of civil governance is 
probably a by-product of the fact that military training and segmentation, often 
based on some form of patriotic ideology, are conducive to the formation of 
relatively effective large-scale institutions. Wherever groups of people interact 
routinely, they are liable to develop a tribal ethos. In stratified societies, powerful 
groups readily evolve self-justifying ideologies that buttress treatment of subor¬ 
dinate groups, ranging from neglectful to atrocious. White American Southerners 
had elaborate theories to justify slavery, and pioneers everywhere found the 
brutal suppression of Indian societies legitimate and necessary. The parties and 
interest groups that vie to sway public policy in democracies have well-developed 
rationalizations for their selfish behavior. A major difficulty with loyalties in¬ 
duced by appeals to shared symbolic culture is the very language-like productivity 
possible with this system. Dialect markers of social subgroups emerge rapidly 
along social fault-lines (Labov, 2001). Charismatic innovators regularly launch 
new belief and prestige systems, which sometimes make radical claims on the 
allegiance of new members, sometimes make large claims at the expense of ex¬ 
isting institutions, and sometimes grow explosively. Or larger loyalties can arise, 
as in the case of modern nationalisms overriding smaller-scale loyalties, some¬ 
times for the better, sometimes for the worse. The ongoing evolution of social 
systems can develop in unpredictable, maladaptive directions by such processes 
(Putnam, 2000). The worldwide growth of fundamentalist sects that challenge 
the institutions of modern states is a contemporary example (Marty and Appleby, 
1991). If T. Wolfe (1965) is right, mass media can be the basis of a rich diversity 
of imagined subcommunities using such vehicles as specialized magazines, 
newsletters, and websites. The potential of deviant subgroups, such as sectarian 
terrorist organizations, to use modern media to create small but highly motivated 
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imagined communities is an interesting variant on Anderson’s theory. Ongoing 
cultural evolution is impossible to control wholly in the larger interest, at least 
impossible to control completely, and forbidding free evolution tends to deprive 
societies of the “civic culture” that spontaneously produces so many collective 
benefits. 


Legitimate Institutions 

In small-scale egalitarian societies, individuals have substantial autonomy, con¬ 
siderable voice in community affairs, and can enforce fair, responsive—even self- 
effacing—behavior and treatment from leaders (Boehm, 1999). At their most 
functional, symbolic institutions, a regime of tolerably fair laws and customs, 
effective leadership, and smooth articulation of social segments can roughly 
simulate these conditions in complex societies. Rationally administered bu¬ 
reaucracies, lively markets, the protection of socially beneficial property rights, 
widespread participation in public affairs, and the like provide public and private 
goods efficiently, along with a considerable amount of individual autonomy. 
Many individuals in modern societies feel themselves part of culturally labeled 
tribal-scale groups, such as local political party organizations, that have influence 
on the remotest leaders. In older complex societies, village councils, local no¬ 
tables, tribal chieftains, or religious leaders often hold courts open to humble 
petitioners. These local leaders, in turn, represent their communities to higher 
authorities. To obtain low-cost compliance with management decisions, ruling 
elites have to convince citizens that these decisions are in the interest of the 
larger community. As long as most individuals trust that existing institutions are 
reasonably legitimate and that any felt needs for reform are achievable by means 
of ordinary political activities, there is considerable scope for large-scale col¬ 
lective social action. 

Legitimate institutions, however, and trust of them, are the result of an 
evolutionary history and are neither easy to manage or engineer. Social distance 
between different classes, castes, occupational groups, and regions is objectively 
great. Narrowly interested tribal-scale institutions abound in such societies. 
Some of these groups have access to sources of power that they are tempted to 
use for parochial ends. Such groups include, but are not restricted to, elites. The 
police may abuse their power. Petty administrators may victimize ordinary 
citizens and cheat their bosses. Ethnic political machines may evict historic elites 
from office but use chicanery to avoid enlarging their coalition. 

Without trust in institutions, conflict replaces cooperation along fault lines 
where trust breaks down. Empirically, the limits of the trusting community de¬ 
fine the universe of easy cooperation (Fukuyama, 1995). At worst, trust does not 
extend outside family (Banheld, 1958), and potential for cooperation on a larger 
scale is almost entirely foregone. Such communities are unhappy as well as poor. 
Trust varies considerably in complex societies, and variation in trust seems to be 
the main cause of differences in happiness across societies (Inglehart and Rabier, 
1986). Even the most efficient legitimate institutions are prey to manipulation 
by small-scale organizations and cabals, the so-called special interests of mod¬ 
ern democracies. Putnam et al.’s (1993) contrast between civic institutions in 
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Northern and Southern Italy illustrates the difference that a tradition of func¬ 
tional institutions can make. The democratic form of the state, pioneered by 
Western Europeans in the last couple of centuries, is a powerful means of creating 
generally legitimate institutions. Success attracts imitation all around the world. 
The halting growth of the democratic state in countries ranging from Germany to 
sub-Saharan Africa is testimony that legitimate institutions cannot be drummed 
up out of the ground just by adopting a constitution. Where democracy has taken 
root outside of the European cultural orbit, it is distinctively fitted to the new 
cultural milieu, as in India and Japan. 


Conclusion 

The processes of cultural evolution quite plausibly led to group selection being a 
more powerful force on cultural rather than genetic variation. The cultural system 
of inheritance probably arose in the human lineage as an adaptation to the in¬ 
creasingly variable environments of the recent past (Richerson and Boyd, 2000a, 
b). Theoretical models show that the specific structural features of cultural sys¬ 
tems, such as conformist transmission, have ordinary adaptive advantages. We 
imagine that these adaptive advantages favored the capacity for a system that 
could respond rapidly and flexibly to environmental variation in an ancestral 
creature that was not particularly cooperative. As a by-product, cultural evolution 
happened to favor large-scale cooperation. Over a long period of coevolution, 
cultural pressures reshaped “human nature,” giving rise to innate adaptations to 
living in tribal-scale social systems. Humans became prepared to use systems of 
legitimate punishment to lower the fitness of deviants, for example. We believe 
that the cultural explanation for human cooperation is in accord with much ev¬ 
idence, as summarized by stylized facts about human cooperation with which we 
introduced our remarks. More detailed surveys of the concordance of our con¬ 
jectures with various bodies of data may be found in Richerson and Boyd (1999, 
2001) and Richerson, Boyd, and Paciotti (2002). 

Regardless of the fate of any particular proposals, we think that explanations 
of human cooperation have to thread some rather tight constraints. They have to 
somehow finesse the awkward fact that humans, at least partly because of our 
ability to cooperate with distantly related people in large groups, are a huge 
success yet quite unique in our style of social life. If a mechanism like indirect 
reciprocity works, why have not many social species used it to extend their range 
of cooperation? If finding self-reinforcing solutions to coordination games is mostly 
what human societies are about, why do not other animals have massive coordi¬ 
nation-based social systems? If reputations for pairwise cooperation are easy to 
observe or signal (but unexploitable by deceptive defectors), why have we found 
no other complex animal societies based on this principle? By contrast, we do find 
plenty of complex animal societies built on the principle of inclusive fitness. 

The unique pattern of cooperation of our species suggests that human co¬ 
operation is likely to derive from some other unique feature or features of human 
life. Advanced capacities for social learning are also unique to humans; thus, 
culture is, prima facie, a plausible key element in the evolution of human 
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cooperation. Our argument depends upon the existence of culture and group 
selection on cultural variation. Since sophisticated culture is unique to humans, 
we do not expect this mechanism to operate in other species. Ours is not the only 
hypothesis that passes this basic test. For example, E. Smith's (2003) signaling 
hypothesis depends upon language, another unique feature of the human species. 
E. Hagen made a similar proposal in his comment on our background paper. He 
argued that the inventiveness of humans combined with language as a cheap 
communication device adapts us to solve problems of cooperation. We think that 
hypotheses in this vein, like Alexander’s proposed indirect reciprocity mecha¬ 
nism, cannot be decisively rejected, but they are far from completely specified. 
What is it that biases invention and cheap talk in favor of cooperative rather than 
selfish ends? The intuition that cheap talk, symbolic rewards, and clever institutions 
are in themselves sufficient to explain human cooperation probably comes from the 
common experience that people do find it rather easy to use such devices to 
cooperate (e.g., Ostrom, Gardner, and Walker, 1994). The difficult question is 
whether these are backed up by unselfish motives on the part of at least some 
people. A literal interpretation of experiments such as those of Fehr and Gachter 
(2002) and Batson (1991) suggests that unselfish motives play important roles. 
However, unselfish motives may be a proximal evolutionary result of an ultimate 
indirect reciprocity sort of evolutionary process rather than the result of a group 
selection mechanism. Those who attempt deception in a world of clever co- 
operators may simply expose their lack of cleverness, so that the best strategy is an 
unfeigned willingness to cooperate. The data that cultural group selection is an ap¬ 
preciable process (Soltis et al., 1995) are also not definitive, since they could be 
weak relative to some competing process of the indirect reciprocity sort. 

Another complication is that hypotheses leaning on language, technology, 
and intelligence are appealing to phenomena with considerable cultural content. 
The evolution of technology and the diffusion of innovations are cultural pro¬ 
cesses that depend upon institutions and a sophisticated social psychology 
(Henrich, 2001). Both the cultural and genetic evolution of our cognitive ca¬ 
pacities (some of which gave rise to language) likely emerged from a culture- 
gene coevolutionary process (Henrich and McElreath, 2002; Tomasello, 1999). 
Thus, these hypotheses are not, we submit, clean alternatives to the cultural 
group selection hypothesis, absent further specification. In the future, we expect 
that competing hypotheses will be developed in sufficient detail that more 
precise comparative empirical tests will be possible. 

For example, even if innatist linguists are correct that much of what we need 
to know to speak is innate, we wonder why more is not innate? Why is it that 
mutually unintelligible languages arise so rapidly? Would not we be better off if 
everyone spoke the same common entirely innate language? Not necessarily. Very 
often people from distant places are likely to have evolved different ways of doing 
things that are adaptive at home but not abroad. Similarly, avoiding listening to 
people is a wise idea if they are proposing a behavior deviant from locally pre¬ 
vailing coordination equilibria. Cultural evolution can run up adaptive barriers to 
communication quite readily if listening to foreigners makes one liable to acquire 
erroneous ideas (McElreath, Boyd, and Richerson 2003). Dialect evolution seems 
to be a highly nuanced system for regulating communication within languages 
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as well as between them, although the adaptive significance of dialect is hardly 
well worked out (Laboy, 2001). Interestingly, in McElreath et al.’s model, using a 
symbolic signal to express a willingness to cooperate cannot support the evolution 
of a symbolic marker of group membership because defectors as well as potential 
cooperators will be attracted by the signal. A symbolic system can be used to 
communicate intention to cooperate only if potential cooperative partners can 
exchange trustworthy signals. Once symbolic markers became sufficiently com¬ 
plex as to be unfakable by defectors and a sufficiently large pool of relatively 
anonymous but trustworthy signalers exist, then cheap signals will be useful. 
Dialect is difficult to fake although cheap to use, and once some level of coop¬ 
eration on a proto-tribal scale was possible, proto-languages might have come 
under selection to create unfakable signals of group membership that imply an 
intention to cooperate. We suspect that language could have evolved only in 
concert with a measure of trust of other speakers rather than being an unaided 
generator of trust. To the extent that cooperation is the game, one has no interest 
in listening to speakers whose messages are self-serving. Think of how annoying 
we find telemarketer’s speech acts. Sociolinguists make much of the concept that 
speech is a cooperative system and argue that the empirical structure of con¬ 
versation is consistent with this assumption (Wardhaugh, 1992). Language seems 
to presuppose cooperation as much as it in turn facilitates cooperation. 

That technology, like language, is one of the major components of the hu¬ 
man adaptation is undeniable. It opens up opportunities to gain advantage to 
cooperation in hunting and defense and to exploit the possibilities of the division 
of labor. What is less well understood is the extent to which technology is likely 
a product of large-scale social systems. Henrich (2004b) has analyzed models of 
the “Tasmanian Effect.” At the time of European contact, the Tasmanians had 
the simplest toolkit ever recorded in an extant human society; it was, for ex¬ 
ample, substantially simpler than the toolkits of ethnographically known foragers 
in the Kalahari and Tierra del Fuego, as well as those associated with human 
groups from the Upper Paleolithic. Archaeological evidence indicates that 
Tasmanian simplicity resulted from both the gradual loss of items from their 
own pre-Holocene toolkit and the failure to develop many of the technologies 
that subsequently arose only 150 km to the north in Australia. The loss likely 
began after the Bass Strait was flooded by rising post-glacial sea levels (Jones, 
1995). Henrich's analysis indicates that imperfect inference during social 
learning, rather than stochastic loss due to drift-like effects, is the most likely 
reason for this loss. This suggests that to maintain an equilibrium toolkit as com¬ 
plex as those of late Pleistocene hunter-gatherers likely required a rather large 
population of people who interacted fairly freely so that rare, highly skilled 
performances, spread by selective imitation, could compensate for the routine 
loss of skills due to imperfect inference. Neanderthals and perhaps other archaic 
human populations had large brains but simple toolkits. The Tasmanian effect 
may explain why. Archaeology suggests that Neanderthal population densities 
were lower than those of the modern humans that replaced them in Europe and 
that they had less routine contact with their neighbors, as evidenced by shorter 
distance movement of high-quality raw materials from their sources compared 
to those for modern humans (Klein, 1999). 
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The proposal that human intelligence is at the root of human cooperation is 
difficult to evaluate because of the ambiguity in what we might mean by intelli¬ 
gence in a comparative context (Hinde, 1970:659-663). As the Tasmanian Effect 
illustrates, individual human intelligence is only a part, and perhaps only a small 
part, of being able to create complex adaptive behaviors. In fact, we think “intel¬ 
ligence” plays little role in the emergence of many human complex adaptations. 
Instead, humans seem to depend upon socially learned strategies to finesse the 
shortcomings of their cognitive capabilities (Nisbett and Ross, 1980). The details 
of human cognitive abilities apparently vary substantially across cultures because 
culturally transmitted cognitive styles differ (Nisbett et ah, 2001). Although we 
share the common intuition that humans are individually more intelligent than 
even our very clever fellow apes, we are not aware of any experiments that suffi¬ 
ciently control for our cultural repertoires to be sure that it is correct. The concept 
of “intelligence” in individual humans perhaps makes little sense apart from their 
cultural repertoires: humans are smart in part because they can bring a variety of 
“cultural tools” (e.g., numbers, symbols, maps, various kinematic models) to bear 
on problems. A hunter-gatherer would seem an incredibly stupid college professor, 
but college professors would seem equally dense if forced to try to survive as 
hunter-gatherers (a few knowledgeable anthropologists aside). Even abilities as 
seemingly basic as those related directly to visual perception vary across cultures 
(Segall, Campbell, and Herskovits, 1966). Second, intelligence implies a means to an 
end, not an end in itself. Individual intelligence ought to serve the ends of both 
cooperation and defection. We suspect that actually defection, requiring trickery 
and deception, is better served by intelligence than cooperation. Game theorists 
assuming perfect, but selfish, rationality predict that humans should defect in the 
one-shot anonymous prisoner’s dilemma, just as evolutionary biologists predict 
that dumb beasts using evolved predispositions will. Whiten and Bryne (1988) 
characterized our social intelligence as “Machiavellian,” implying that it does in¬ 
deed serve deception equally with honesty. However, just as humans punish al¬ 
truistically, they seem also to exert their political intelligence altruistically (e.g., 
Sears and Funk, 1990), biasing the evolution of institutions accordingly. On the 
basis of our brain size compared to that of other apes, Dunbar (1992) predicts that 
human groups ought to number around 50. Hunter-gatherer co-residential bands 
do number around 50, but culturally transmitted institutions web together bands to 
create tribes typically numbering a few hundred to a few thousand people, as we 
have seen. Human political systems do seem to exceed in scale anything predicted 
on the basis of enhanced Machiavellian talents (supposing that such talents can on 
average increase social scale at all). The institutional basis of these systems is not far 
to seek. For example, Wiessner (1984) describes how institutions of ceremonial 
exchange of gifts knit the famous IKung San bands into a much larger-risk pooling 
cooperative. Australian aboriginal groups show similar functional patterns, which 
are built out of quite different and substantially more elaborate sets of cultural 
practices (Peterson, 1979). Underpinning such individual-to-individual bond 
making is likely the kind of generalized trust that co-ethnics have for one another. If 
Murphy and Murphy (1986) are correct about the Northern Shoshone, a society of 
thousands constituted a functional ‘ ‘people’ ’ engaging in mutual aid in a hostile and 
uncertain environment on the basis of little more than a common language. In his 
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classic ethnography of the Nuer, Evans-Pritchard (1940) describes how simple 
tribal institutions can knit herding people into tribes numbering tens of thousands, 
much larger than was possible among hunter-gathers. The size of hunter-gatherer 
societies was evidently limited by low population density, not by their relatively 
unsophisticated institutions. Third, Henrich and Gil-White (2001) propose that 
human prestige systems are an adaptation to facilitate cultural transmission. Social 
learning means that the returns to effort in individual learning potentially result in 
gains for many subsequent social learners who do not have to “reinvent the wheel.” 
If extra individual effort in acquiring better ideas pays off in prestige and if prestige 
leads to fitness advantages, then the social returns to effortful individual learning 
will in part be reflected in private returns to individual learners. Group selection on 
prestige systems may further enlarge the returns to investment in individual 
learning and bring returns up to a level that reflects the group optimum amount of 
effort in individual learning. If this mechanism operates, human intelligence may 
have been enhanced by social selection emanating from institutions of prestige. 7 

We propose that group selection on cultural variation is at the heart of 
human cooperation, but we certainly recognize that our sociality is a complex 
system that includes many linked components. Surely, without punishment, 
language, technology, individual intelligence and inventiveness, ready establish¬ 
ment of reciprocal arrangements, prestige systems, and solutions to games of 
coordination, our societies would take on a distinctly different cast, to say the 
least. Human sociality no doubt has a number of components that were neces¬ 
sary to its evolution and are necessary to its current functions. If such is the case, 
prime mover explanations giving pride of place to a single mechanism are vain to 
seek. Thus, a major constraint on explanations of human sociality is its systemic 
structure. Explanations have to have a plausible historical sequence tracing how 
the currently interrelated parts evolved, perhaps piecemeal. And explanations 
have to account for the current functional and dysfunctional properties of hu¬ 
man social systems. We are far from having completed this task. 


NOTES 

1. “Cooperation” has a broad and a narrow definition. The broad definition 
includes all forms of mutually beneficial joint action by two or more individuals. The 
narrow definition is restricted to situations in which joint action poses a dilemma for 
at least one individual such that, at least in the short run, that individual would be 
better off not cooperating. We employ the narrow definition in this chapter. The 
"cooperate” versus "defect” strategies in the prisoner's dilemma and commons 
games anchor our concept of cooperation, making it more or less equivalent to the 
term “altruism” in evolutionary biology. Thus, we distinguish “coordination” (joint 
interactions that are “self-policing” because payoffs are highest if everyone does the 
same thing) and division of labor (joint action in which payoffs are highest if in¬ 
dividuals do different things) from cooperation. 

2. We refer to cultural evolution as changes in the pool of cultural variants 
carried by a population of individuals as a function of time and the processes that 
cause the changes. 
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3. It is not obvious that language potentiates indirect reciprocity. Whereas su¬ 
perficially language may seem to promote the exchange of high-quality information 
required for indirect reciprocity to favor cooperation, this addition merely changes the 
question slightly to one of why individuals would cooperate in information sharing; 
language merely recreates the same public goods dilemma. Lies about hunting success, 
for example, are difficult to check and often ambiguous. Among the Gunwinggu 
(Australian foragers), members of one band often lied to members of other bands 
about their success to avoid having to share meat (Altman and Peterson, 1988). 

4. Several prominent modern Darwinians, Hamilton (1975), Wilson (1975: 
561-562), Alexander (1987:169), and Eibl-Eibesfeldt (1982), have given serious 
consideration to group selection as a force in the special case of human ultra-sociality. 
They are impressed, as we are, by the organization of human populations into units 
that engage in sustained, lethal combat with other groups, not to mention other 
forms of cooperation. The trouble with a straightforward group selection hypothesis 
is our mating system. We do not build up concentrations of intrademic relatedness 
like social insects, and few demic boundaries are without considerable intermarriage. 
Moreover, the details of human combat are more lethal to the hypothesis of genetic 
group selection than to the human participants. For some of the most violent groups 
among simple societies, wife capture is one of the main motives for raids on 
neighbors, a process that could hardly be better designed to erase genetic variation 
between groups and stifle genetic group selection. 

5. We are aware that much controversy surrounds the use of microevolutionary 
models to explain macroevolutionary questions. Our thoughts on the issues are 
summarized in Boyd and Richerson (1992a). 

6. It would be a mistake to assume that complex technology is a prerequisite for 
tribal-level forms of social organization. At the time of European discovery, the 
Tasmanians had a technology substantially simpler than that of many Upper Paleo¬ 
lithic peoples: they lacked bone tools, composite spears, bows, arrows, spear 
throwers, and fish hooks, etc. Yet they lived in multiband groups, which controlled 
territories. Intertribal trade, warfare, and raiding were all commonplace (Jones, 
1995). The last 4,000 years of the Tasmania archaeological record do not look much 
different from many middle Paleolithic sites. 

7. Similarly, as Smith (2003) notes, Hawkes hypothesizes that men contribute 
to hunting success to “show off” and that showing off earns men reproductive 
success in terms of sexual favors from women. Contrary to what Hawkes supposes, 
this system is a possible focus of cultural group selection. In many hunter-gatherer 
groups, meat is very widely shared and hunters often do not control its distribution. 
Personal favors granted to a successful hunter as recompense for effort will benefit all 
who share his kills. Showing that individuals who contribute heavily to the common 
good are rewarded is not evidence that group-selected effects are absent. In the end, 
group selection can succeed only if altruistic individuals on average do better than 
selfish ones. The fact that hunters are not allowed to bargain with consumers of their 
kills and yet are rewarded by consumers anyway is at least as consistent with the 
operation of group selection as with a competing individualist explanation. 
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PART 4 

Archaeology and 
Culture History 


Historians and scientists do not always get along very well. Many 
historians view science as a procrustean enterprise whose practitioners insist 
on shoehorning complex historical phenomena into overly simple general 
laws. For their part, scientists often think that historians exaggerate the 
complexity and contingent nature of historical events, willfully refusing to see 
the order that underlies chaos of one thing after another. This debate is 
echoed in evolutionary biology where Steven J. Gould famously upheld a 
historicist version of organic evolution, a habit that made many mainstream 
evolutionary biologists hopping mad. 

In our view, these debates are rooted in a mistaken view of evolutionary 
theory. Surely historical contingency plays a role in every sort of evolution from 
the cosmic to the cultural. The Big Bang was a singular event. So was the 
evolution of our unique species (and every other unique species, for that 
matter). However, evolutionary scientists do not try to j am this complexity 
into the straitjacket of general laws like those in physics. Instead, they aim to 
develop a toolkit of models and a collection of related empirical generalizations. 
The phenomena of evolution are not only complex but also diverse. No 
model and no empirical generalization is guaranteed to hold from one case 
to the next. Yet the lesson of biology is that this piecemeal approach to theory 
can yield deep insights. In chapter 19, we review the case for using a toolkit 
of simple models to explain complex and diverse phenomena like cultural 
evolution. Here we consider the role of theory-as-tools in understanding 
phenomena in which historical contingency plays a large, if not dominant, role. 

Chapter 15 discusses why evolutionary processes give rise to history— 
meaning patterns of change with time in which the same initial conditions 
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result in divergent evolutionary trajectories or in which change is nonsta¬ 
tionary. The very simplest evolutionary models of adaptation by natural 
selection give rise to trajectories that converge on unique equilibria from 
divergent initial conditions. Add simple noise or simple oscillatory mechan¬ 
isms in key processes and the change will never cease. But it is stationary and 
thus will remain “lawful.” However, real evolutionary trajectories do diverge 
from identical starting points and do result in patterns whose statistical 
properties are not stationary, and this fact limits the predictive power of 
evolutionary theory. The “laws” of nature are, in effect, ever-changing. In this 
chapter, we suggest a number of means by which rather straightforward 
adaptive processes can result in divergent, nonstationary change. If the argu¬ 
ment is correct, the scientists’ tools should prove quite useful to historians 
even if what we provide is not laws. Just demonstrating how divergence and 
nonstationarity themselves arise shows how the scientific approach can illu¬ 
minate historical questions at the most fundamental level. 

In chapter 16 we consider the problem of constructing cultural phylogenies. 
Phylogenies are useful, among other things, for controlling for the effects of 
common history in scientific studies of organic and cultural evolution. In recent 
years, evolutionary biologists have made great technical strides in the science of 
phylogeny reconstruction, and these advances have promise for application to 
cultures. The difficulty is that cultures do not have the simple branching histories 
that characterize most biological species—cross-cultural diffusion occurs in 
every domain of culture. Whether this fact causes important problems for 
phylogenetic reconstruction is an open question. Historical linguists have long 
struggled with this problem with some success. Language trees are a much used 
starting point for cultural phylogeny reconstruction, despite their obvious 
limitations. In places like aboriginal western North America, groups with 
unrelated languages often have very similar subsistence systems and even similar 
political and social organization. Even the most conservative features of language 
change rapidly so that most historical linguists believe that phylogenetic 
reconstruction is possible only for the last few thousand years. Another approach 
is to consider the phylogeny of single traits or small, tightly knit clusters of traits 
rather than of cultures as a whole. However, such items may contain too little 
historical information for accurate reconstruction. Future methodological 
innovations may solve many of these problems. In the meantime, the difficulty of 
cultural phylogeny reconstruction illustrates an important point. Humans are 
one species; our genes and our culture tend to diffuse very widely. Local 
populations are seldom if ever isolated for any substantial length of time. 
Ideologues often want to use the concept of culture like the concept of race, 
imagining that their culture has a “pure” history. In fact, all cultures have 
tangled, messy histories, even messier than our genes, if that is possible. 

Chapter 17 deals with a specific historical problem, the origins of agri¬ 
culture. This phenomenon is typical of a number of problems in human 
evolution in that it is a particular nonstationary pattern: it is “progressive.” 
Human technology and probably human social complexity have increased 
more or less steadily, if at greatly different rates, seemingly since our lineage 
branched from that of the other apes. The progressive pattern is especially 
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marked during the last 250,000 years or so (McBrearty and Brooks, 2000). 
Many scholars are not puzzled by such patterns. To them, the obvious expla¬ 
nation is that evolution is the process of replacing antique, less adaptive traits 
with modern, more adaptive ones. The problem is that selective processes 
usually reach equilibria too rapidly to generate long-run progress on geological 
timescales. Evolution can produce steady progress only if the processes internal 
to the evolutionary process slow it down or if the pace of evolution is set by 
external environmental factors. The origin and spread of agriculture provides 
an interesting test case because it is among the most important events in hu¬ 
man history, serving, as it still does, as the subsistence basis for the evolution of 
even more complex societies in the last few thousand years. Recently, the 
most popular explanations have been based on population pressure, the idea 
that humans turned to agriculture when population densities rose to the point 
that less intensive hunting and gathering techniques began to favor investment 
in agricultural production. In this chapter, we argue that population pressure 
acts at far too short a timescale to explain agricultural origins. As Malthus 
noted, population pressure builds appreciably on the generational timescale; if 
it paced cultural evolution, events would transpire at a much faster pace 
than archaeologists normally observe. Climate change is a better candidate to 
explain why agriculture first began appearing about 11,600 years ago. Recent 
advances in paleoclimatology have shown that last-glacial climates were ex¬ 
ceedingly variable compared to the period since 11,600 years ago. Climates 
in the last glacial age were also mainly drier than modern ones and lower CO 2 
may also have handicapped plant production. Agricultural subsistence is dif¬ 
ficult in modern climates and takes several thousand years to evolve. Perhaps 
agriculture was impossible in the Pleistocene epoch. 

Our main objective in this section is not to push particular answers to 
particular historical, archaeological, and paleoanthropological problems 
(Richerson and Boyd, 2001). Rather, we want to advertise to those who 
study historical problems that cultural evolutionary theory has tools that 
students of these phenomena need in their repertoire. Even when we can¬ 
not say much about how evolution works, we can often use a combination 
of theory and empiricism to estimate the rates of change characteristic of 
different processes. Quite elementary considerations can sometimes rule some 
processes in and some out as candidate explanations for a given event. Just as 
astronomers need the theory of nuclear physics to understand how stars 
evolve, so historians, archaeologists, and paleoanthropologists need the theory 
of cultural evolution to understand human evolutionary history. 
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1 5 How Microevolutionary 
Processes Give Rise 
to History 


Over the last decade a number of authors, including ourselves, have 
attempted to understand human cultural variation using Darwinian methods. 
This work is unified by the idea that culture is a system of inheritance: in¬ 
dividuals vary in their skills, habits, beliefs, values, and attitudes, and these 
variations are transmitted to others through time by teaching, imitation, and 
other forms of social learning. To understand cultural change, we must account 
for the microevolutionary processes that increase the numbers of some cultural 
variants and reduce the numbers of others. 

Social scientists have made a number of objections to this approach to 
understanding cultural change. Among these is the idea that culture can only be 
explained historically. Because the history of any given human society is a se¬ 
quence of unique and contingent events, explanations of human social life, it is 
argued, are necessarily interpretive and particularistic. Present phenomena are 
best explained mainly in terms of past contingencies, not ahistorical adaptive 
processes that would erase the trace of history. Like other scientific (rather than 
historical) explanations of human cultures, the argument goes, Darwinian 
models cannot account for the lack of correlation of environmental and cultural 
variation, nor the long-term trends in cultural change. 

In this chapter, we defend the Darwinian theories of cultural change against 
this objection by suggesting that several cultural evolutionary processes can give 
rise to divergent evolutionary developments, secular trends, and other features 
that can generate unique historical sequences for particular societies. We also 
argue that Darwinian theory offers useful tools for those interested in under¬ 
standing the evolution of particular societies. Essentially similar processes act in 
the case of organic evolution. Darwinian theory is both scientific and historical. 
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The history of any evolving lineage or culture is a sequence of unique, contingent 
events. Similar environments often give rise to different evolutionary trajecto¬ 
ries, even among initially similar taxa or societies, and some show very long-run 
trends in features such as size. Nonetheless, these historical features of organic 
and cultural evolution can result from a few simple microevolutionary processes. 

A proper understanding of the relationship between the historical and the 
scientific is important for progress in the social and biological sciences. There is 
(or ought to be) an intimate interplay between the study of the unique events of 
given historical sequences and the generalizations about process constructed by 
studying many cases in a comparative and synthetic framework. The study of 
unique cases furnishes the data from which generalizations are derived, while the 
generalizations allow us to understand better the processes that operated on 
particular historical trajectories. We cannot neglect the close, critical study of 
particular cases without putting the database for generalization in jeopardy. 
Besides, we often have legitimate reasons to be curious about exactly how 
particular historical sequences, such as the evolution of Homo sapiens, occurred. 
On the other hand, it is from the study of many cases that we form a body of 
theory about evolutionary processes. No one historical trajectory contains enough 
information to obtain a very good grasp of the processes that affected its own 
evolution. Data are missing because the record is imperfect. The lineage may be 
extinct, and so direct observation is impossible. Even if the lineage is extant, 
experimentation may be impossible for practical or ethical reasons. Potential 
causal variables may be correlated in particular cases, so understanding their 
behavior may be impossible. The comparative method can often clarify such 
cases. “Scientists” need “historians” and vice versa. 


Darwinian Models of Cultural Evolution 

Over the past two decades, a number of scholars have attempted to understand 
the processes of cultural evolution in Darwinian terms. Social scientists (Camp¬ 
bell, 1965, 1975; Cloak, 1975; Durham, 1976; Ruyle, 1973) have argued that the 
analogy between genetic and cultural transmission is the best basis for a general 
theory of culture. Several biologists have considered how culturally transmitted 
behavior fits into the framework of neo-Darwinism (Pulliam and Dunford, 1980; 
Lumsden and Wilson, 1981; Boyd and Richerson, 1985; Richerson and Boyd, 
1989b; Cavalli-Sforza and Feldman, 1983; Rogers, 1989). Other biologists and 
psychologists have used the formal similarities between genetic and cultural 
transmission to develop theories describing the dynamics of cultural transmission 
(Cavalli-Sforza and Feldman, 1973, 1981; Cloninger, Rice, and Reich, 1979; 
Eaves, Last, Young, and Martin, 1978). All of these authors are interested in a 
synthetic theory of process applying to how culture works in all cultures, includ¬ 
ing in other species that might have systems with a useful similarity to human 
culture. Note that this last broadly comparative concern is likely to be useful in 
dissecting the reasons why the human lineage originally became more cultural 
than typical mammals. 1 
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The idea that unifies the Darwinian approach is that culture constitutes a 
system of inheritance. People acquire skills, beliefs, attitudes, and values from 
others by imitation and enculturation (social learning), and these “cultural 
variants,” together with their genotypes and environments, determine their 
behavior. Since determinants of behavior are communicated from one person to 
another, individuals sample from and contribute to a collective pool of ideas that 
changes over time. In other words, cultures have similar population-level 
properties as gene pools, as different as the two systems of inheritance are in the 
details of how they work. (In one respect, the Darwinian study of cultural 
evolution is more Darwinian than the modern theory of organic evolution. 
Darwin not only used a notion of “inherited habits” that is much like the 
modern concept of culture but also thought that organic evolution generally 
included the property of the inheritance of acquired variation, which culture 
does and genes do not.) 

Because cultural change is a population process, it can be studied using 
Darwinian methods. To understand why people behave as they do in a particular 
environment, we must know the nature of the skills, beliefs, attitudes, and values 
that they have acquired from others by cultural inheritance. To do this, we must 
account for the processes that affect cultural variation as individuals acquire 
cultural traits, use the acquired information to guide behavior, and act as models 
for others. What processes increase or decrease the proportion of people in a 
society who hold particular ideas about how to behave? We thus seek to un¬ 
derstand the cultural analogs of the forces of natural selection, mutation, and 
drift that drive genetic evolution. We divide these forces into three classes: 
random forces, natural selection, and the decision-making forces. 

Random forces are the cultural analogs of mutation and drift in genetic 
transmission. Intuitively, it seems likely that random errors, individual idiosyn¬ 
crasies, and chance transmission play a role in behavior and social learning. For 
example, linguists have documented a good deal of individual variation in 
speech, some of which is probably random individual variation (Labov, 1972). 
Similarly, small human populations might well lose rare skills or knowledge by 
chance, for example, due to the premature deaths of the only individuals who 
acquired them (Diamond, 1978). 

Natural selection may operate directly on cultural variation. Selection is an 
extremely general evolutionary process (Campbell, 1965). Darwin formulated a 
clear statement of natural selection without a correct understanding of genetic 
inheritance because it is a force that will operate on any system of inheritance 
with a few key properties. There must be heritable variation, the variants must 
affect phenotype, and the phenotypic differences must affect individuals’ chances 
of transmitting the variants they carry. That variants are transmitted by imitation 
rather than sexual or asexual reproduction does not affect the basic argument, 
nor does the possibility that the source of variation is not random. Darwin 
imagined that random variation, acquired variation, and natural selection all 
acted together as forces in organic evolution. In the case of cultural evolution, 
this seems to be the case. It may well be, however, that behavioral variants 
favored by natural selection depend on the mode of transmission. The behaviors 
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that maximize numbers of offspring may not be the same as those that maximize 
cultural influence on future generations (Boyd and Richerson, 1985). 

Decision-making forces result when individuals evaluate alternative behav¬ 
ioral variants and preferentially adopt some variants relative to others. If many of 
the individuals in a population make similar decisions about variants, especially 
if similar decisions are made for a number of generations, the pool of cultural 
variants can be transformed. Naive individuals may be exposed to a variety of 
models and preferentially imitate some rather than others. We call this force 
biased transmission. Alternatively, individuals may modify existing behaviors or 
invent new ones by individual learning. If the modified behavior is then trans¬ 
mitted, the resulting force is much like the guided, nonrandom variation of 
“Lamarckian” evolution. Put differently, humans are embedded in a complex 
social network through which they actively participate in the creation and per¬ 
petuation of their culture. 

The decision-making forces are derived forces (Campbell, 1965). Decisions 
require rules for making them, and ultimately the rules must derive from the 
action of other forces. That is, if individual decisions are not to be random, there 
must be some sense of psychological reward or similar process that causes 
individual decisions to be predictable, in given environments, at least. These 
decision-making rules may be acquired during an earlier episode of cultural 
transmission, or they may be genetically transmitted traits that control the 
neurological machinery for acquisition and retention of cultural traits. The latter 
possibility is the basis of the sociobiological hypotheses about cultural evolution 
(Alexander, 1979; Lumsden and Wilson, 1981). Some authors argue that the 
course of cultural evolution is determined by natural selection operating indi¬ 
rectly on cultural variation through the decision-making forces. 

Like natural selection, the decision-making forces may improve the fit of the 
population to the environment. The criteria of fit depend on the nature of the 
underlying decision rules. This is easiest to see when the goals of the decision 
rules are closely correlated with fitness. If human foraging practices are adopted 
or rejected according to their energy payoff per unit time (optimal foraging 
theory’s operational proxy for fitness), then the foraging practices used in the 
population will adapt to changing environments much as if natural selection 
were responsible. If the adoption of foraging practices is strongly affected by 
consideration of prestige, say, that associated with male success hunting dan¬ 
gerous prey, then the resulting pattern of behavior may be different. However, 
there will still be a pattern of adaptation to different environments but now in 
the sense of increasing prestige rather than calories. 


What Makes Change Historical? 

It has often been argued that historical scientific explanations are different in 
kind. Ingold (1985) gives two important versions of this argument. Some authors 
(e.g., Collingwood, 1946) argue that history is uniquely human because it entails 
conscious perception of the past. The second view (e.g., Trigger, 1978) is quite 
different and holds that history involves unique, contingent pathways from the 
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past to the future that are strongly influenced by unpredictable, chance events. 
We focus on the latter view here. For example, capitalism arose in Europe rather 
than China, perhaps because medieval and early modern statesmen failed to 
create a unified empire in the West (McNeill, 1980), and marsupials dominate 
the Australian fauna perhaps because of Australia’s isolation from other con¬ 
tinents in which placental mammals chanced to arise. In contrast, it is argued, 
scientific explanations involve universally applicable laws. In evolutionary biol¬ 
ogy and in anthropology, these often take the form of functional explanations, in 
which only knowledge of present circumstances and general physical laws (e.g., 
the principles of mechanics) are necessary to explain present behavior (Mitchell 
and Valone, 1990). For example, long fallow horticulture is associated with 
tropical forest environments, perhaps because it is the most efficient subsistence 
technology in such environments (Conklin, 1969). 

It has been argued, perhaps nearly as often, that this dichotomy is false. 
Eldredge (1989:9) provides a particularly clear and forceful example of a com¬ 
mon objection: all material entities have properties that can change through time. 
Even simple entities like molecules are characterized by position, momentum, 
charge, and so on. If we could follow a particular water molecule, we would see 
that these properties changed through time—even the water molecule has a his¬ 
tory, according to Eldredge. Yet everyone agrees that we can achieve a satisfactory 
scientific theory of water. Historical explanations, Eldredge argues, are just sci¬ 
entific explanations applied to systems that change through time. We are misled 
because chemists tend to study the average properties of very large numbers of 
water molecules. 

This argument explains too much. Not all change with time is history in the 
sense intended by historically oriented biologists and social scientists. To see this, 
consider an electrical circuit composed of a voltage source, a capacitor, and a fluo¬ 
rescent light. Under the right conditions, the voltage will oscillate through time, 
and these changes can be described by simple laws. Are these oscillations his¬ 
torical? In Eldredge’s view they are; the circuit has a history, a quite boring one, 
but a history nonetheless. Yet such a system does not generate unique and 
contingent trajectories. After the system settles down, one oscillation is just like 
the previous one, and the period and amplitude of the oscillations are not 
contingent on initial conditions. They are not historical in the sense that "one 
damn thing after another” (Elton 1967:40) leads to cumulative, but unpredict¬ 
able change. 

What then makes change historical? We think that two requirements cap¬ 
ture much of what is meant by "history.” These two requirements pose a more 
interesting and serious challenge for reconciling history with a scientific approach 
to explanation. Consider a system like a society or a population that changes 
through time both under the influence of internal dynamics and exogenous 
shocks. Then we suggest that the pattern of change is historical if the following 
statements apply. 

1. Trajectories are not stationary on the time scales of interest. History is more 
than just change—it is change that does not repeat itself. On long enough 
timescales, the oscillations in the circuit become stationary, meaning that the 
chance of finding the system in any particular state becomes constant. Similarly, 
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random day-to-day fluctuations in the weather do not constitute historical 
change if one is interested in organic evolution because, on long evolutionary 
timescales there will be so many days of rain, so many days of sun, and so on. By 
choosing a suitably long period of time, we can construct a scientific theory of 
stationary processes using a statistical rather than strictly deterministic approach. 
In the case of nonstationary historical trajectories, a society or biotic lineage 
tends to become gradually more and more different as time goes by. There is no 
possibility of basing explanation on, say, a long-run mean about which the his¬ 
torical entity fluctuates in some at least statistically predictable way, because the 
mean calculated over longer and longer runs of data continues to change sig¬ 
nificantly. One of the most characteristic statistical signatures of nonstationary 
processes is that the variance they produce grows with time rather than con¬ 
verging on a finite value. Note that a process that is historical in one spatio- 
temporal frame may not be in another. If we are not too interested in a specific 
species or societies in given time periods, we can often average over longer 
periods of time or many historical units to extract ahistorical generalizations. 
Any given water molecule has a history, but it is easy to average over many of 
them and ignore this fact. 

2. Similar initial conditions give rise to qualitatively different trajectories. His¬ 
torical change is strongly influenced by happenstance. This requires that the 
dynamics of the system must be path-dependent; isolated populations or soci¬ 
eties must tend to diverge even when they start from the same initial condition 
and evolve in similar environments. Thus, for example, the spread of a favored 
allele in a series of large populations is not historical. Once the allele becomes 
sufficiently common, it will increase at first exponentially and then slowly, as¬ 
ymptotically approaching fixation. Small changes in the initial frequencies, pop¬ 
ulation size, or even degree of dominance will not lead to qualitative changes in 
this pattern. In separate but similar environments, populations will converge on 
the favored allele. Examples of convergence in similar environments are common— 
witness the general similarity in tropical forest trees and many of the behaviors 
of the long fallow cultivators who live among them the world over. On the other 
hand, there are also striking failures of convergence—witness the many unique 
features of Australian plants, animals, and human cultures. The peculiar hanging 
leaves of eucalypts, the bipedal gait of kangaroos, and the gerontocratic structure 
of Australian aboriginal societies make them distinctively different from the 
inhabitants of similar temperate and subtropical dry environments on other 
continents. 

It is important not to blur the distinction between simple trajectories and 
true historical change; it is easy to see how evolutionary processes like natural 
selection give rise to simple, regular change like the spread of a favored allele or 
subsistence practice. However, it is not so easy to see how such processes give 
rise to unique, contingent pathways. Scientists tout the approach to steady states 
and convergence in similar situations as evidence for the operation of natural 
“laws,” so it seems natural to conclude that a lack of stationarity and conver¬ 
gence are evidence of processes that cannot be subsumed in the standard con¬ 
ceptions of science. Our argument is that things are not at all that simple. There 
is every reason to expect that perfectly ordinary scientific processes, ordinary in 
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the sense that they result from natural causes and are easily understood by 
conventional methods, regularly generate history in the sense defined by these 
two criteria. 


How Do Adaptive Processes Give Rise to History? 

Let us begin with the two most straightforward answers to this question. First, 
it could be that most evolutionary change is random. Much change in organic 
evolution may be the result of drift and mutation, and much change in cultural 
evolution may result from analogous processes. The fact that drift is a very slow 
process would explain long-term evolutionary trends. Raup (1977) and others 
argue that random-walk models produce phylogenies that are remarkably similar 
to real ones. The fact that cultural and genetic evolutionary change is random 
would allow populations in similar environments to diverge from each other. It 
seems likely that some variation in both cases evolves mainly under the influence 
of nonadaptive forces—for example, much of the eukaryotic genome does not 
seem to be expressed and evolves under the influence of drift and mutation 
(Futuyma, 1986:447). Similarly, the arbitrary character of symbolic variation 
suggests that nonadaptive processes are likely to be important in linguistic 
change and similar aspects of culture. In both cases, isolated populations diverge 
at an approximately constant rate on the average. However, to understand why a 
particular species is characterized by a particular DNA sequence, or why a par¬ 
ticular people use a particular word for mother, one must investigate the se¬ 
quence of historical events that led to the current state. 

It is also possible that historical change is generated by abiotic environmental 
factors (Valentine and Moores, 1972). Long-term trends in evolution could result 
from the accurate adaptive tracking of a slowly changing environment. For ex¬ 
ample, during the last hundred million years, there has been a long-term increase 
in the degree of armoring of many marine invertebrates living on rocky substrates 
and a parallel increase in the size and strength of feeding organs among their 
predators (Vermeij, 1987; Jackson, 1988). It is possible that these biotic trends 
have been caused by long-run environmental changes over the same period—for 
example, an increase in the oxygen content of the atmosphere (Holland, 1984). 
Similarly, beginning perhaps as much as 17,000 years ago, humans began a shift 
from migratory big game hunting to sedentary, broad-spectrum, more labor- 
intensive foraging, finally developing agriculture about 7,000 years ago (Henry, 
1989). Many authors (e.g., Reed, 1977) have argued that the transition from 
glacial to interglacial climate that occurred during the same period is somehow 
responsible for this change. Similarly, differences among populations in similar 
environments may result from the environments actually being different in some 
subtle but important way. For example, Westoby (1989) has argued that some 
of the unusual features of the Australian biota result from the continent-wide 
predominance of highly weathered, impoverished soils on this relatively undis¬ 
turbed continental platform. Perhaps the failure of agriculture to develop in or 
diffuse to aboriginal Australia, despite many favorable preconditions and the 
presence of cultivators just across the Torres Strait, also reflects poor soils. 
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It is more difficult to understand how adaptive processes like natural se¬ 
lection can give rise to historical trajectories. There are two hurdles: first, 
adaptive processes in both organic and cultural evolution appear to work on 
rather short timescales compared to the timescales of change known from the 
fossil record, archaeology, and history. Theory, observation, and experiment 
suggest that natural selection can lead to change that is much more rapid than 
any observed in the fossil record (Levinton, 1988:342-347). For example, the 
African Great Lakes have been the locus of spectacular adaptive radiations of 
fishes amounting to hundreds of highly divergent forms from a few ancestors in 
the larger lakes (Lowe-McConnell, 1975). The maximum timescales for these 
radiations, set by the ages of the lakes and not counting that they may have dried 
up during the Pleistocene epoch, are only a few million years. The radiation in 
Lake Victoria (about 200 endemic species) seems to have required only a few 
hundred thousand years. Adaptive cultural change driven by decision-making 
forces can be very fast indeed, as is evidenced by the spread of innovations 
(Rogers, 1983). It is not immediately clear how very short timescale processes 
such as these can give rise to longer-term change of the kind observed in both 
fossil and archaeological records, unless the pace of change is regulated by envi¬ 
ronmental change. In the absence of continuing, long-term, nonstationary en¬ 
vironmental change, adaptive processes seem quite capable of reaching equilibria 
in relatively short order. In other words, both cultural and organic evolution 
seem, at first glance, to be classic scientific processes that produce functional 
adjustments too rapidly to account for the slow historical trajectories we actually 
observe. 

Second, it is not obvious why adaptive processes should be sensitive to initial 
conditions. Within anthropology the view that adaptive processes are ahistorical 
in this sense underpins many critiques of functionalism. Many anthropologists 
claim that it is self-evident that cultural evolution is historical and that, there¬ 
fore, adaptive explanations (being intrinsically equilibrist and ahistorical) must 
be wrong. For example, Hallpike (1986) presents a variety of data that show that 
peoples living in similar environments often have quite different social organi¬ 
zation, and historically related cultures often retain similar social organizations 
despite occupying radically different environments. Because functionalist models 
predict a one-to-one relationship between environment and social organization, 
he argues, these data falsify the functionalist view. Indeed, functionalists like 
Cohen (1974:86) expect to see history manifest only in the case of functionally 
equivalent symbolic forms. Biologists have generally been more aware that a 
population’s response to selection depends on phylogenetic and developmental 
constraints and, therefore, that evolutionary trajectories are, at least to a degree, 
path-dependent. Nonetheless, lack of convergence is sometimes used to argue 
the lack of importance of natural selection. Should selection not cause popula¬ 
tions exposed to similar environments to converge on similar adaptations? 
Certainly, some striking convergences from unlikely ancestors do exist. 

Here we argue that path dependence and long-term change are likely to be 
consequences of any adaptive process analogous to natural selection. Our claims 
are rather general and are thus independent of the nature of the transmission 
process (genetic or cultural) and of the details of development. Let us begin with 
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an especially simple example of genetic evolution. Consider a large population of 
organisms in which individuals’ phenotypes can be represented as a number of 
quantitative characters. Let us assume that there are no constraints on what can 
evolve due to properties of the genetic system itself. One model with this 
property assumes that the distribution of additive genetic values 2 for each char¬ 
acter is Gaussian, that there are no genetic correlations among characters, that no 
genotype-environment interactions exist, and that mutation maintains a constant 
amount of heritable variation for each character. Further, assume that the fitness 
of each individual depends only on its own phenotype, not on the frequency of 
other phenotypes or the population density, and there is no environmental 
change. With these assumptions, it can be shown that the change in the vector of 
mean values for each character is along the gradient of the logarithm of average 
fitness (Lande, 1979). In other words, the mean phenotype in the population 
changes in the direction that maximizes the increase in the average fitness of the 
population. This is the sort of situation in which selection, and similar processes 
in the cultural system, ought to produce optimal adaptations in the straight¬ 
forward manner depicted in elementary textbooks. 

In this simple model the evolutionary trajectory of the population will be 
completely governed by the shape of average fitness as a function of mean phe¬ 
notype. If the adaptive topography has a unique maximum, then every popu¬ 
lation will evolve to the same equilibrium mean phenotype, independent of its 
starting position, and, once there, be maintained by stabilizing selection. On the 
other hand, if there is more than one local maximum, different equilibrium 
outcomes are possible depending on initial condition. The larger the number of 
local maxima, the more path-dependent the resulting trajectories will be (see 
figure 15.1). 

Unfortunately, we do not know what real adaptive topographies look like, 
and, as Lande (1986) has pointed out, there is little chance that we will be able 
to determine their shape empirically. In evolutionary texts, adaptive topo¬ 
graphies are commonly depicted as a smooth three-dimensional surface with a 
small number of local maxima. However, if evolutionary “design problems” are 
similar to the engineering ones, this picture is misleading. Experience with en¬ 
gineering design problems suggests that real adaptive topographies are often 
extremely complex, with long ridges, multiple saddle points, and many local 
optima—more akin to the topographic map of a real mountain range than the 
smooth textbook surfaces. 

A computer design problem discussed by Kirkpatrick, Gelatt, and Vecchi 
(1983) provides an excellent example. Computers are constructed from large 
numbers of interconnected circuits, each with some logical function. Because the 
size of chips is limited, circuits must be divided among different chips. Because 
signals between chips travel more slowly and require more power than signals 
within chips, designers want to apportion circuits among chips so as to minimize 
the number of connections between them. For even moderate numbers of circuits, 
there is an astronomical number of solutions to this problem. Kirkpatrick et al. 
present an example in which the 5,000 circuits that make up the IBM 370 mi¬ 
croprocessor were to be divided between two chips. Here there are about 10 1503 
possible solutions! This design problem has two important qualitative properties: 
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Figure 15.1. This figure shows two adaptive topographies. The axes are the mean 
genetic value in a population for two characters. The contour lines give contours of 
equal mean fitness. Populations beginning at different initial states all achieve the same 
equilibrium state. Figure 15.1a shows a simple unimodal adaptive topography. 

Figure 15.1b shows a complex, multimodal topography. Initially similar populations 
diverge owing only to the influence of selection. 


1. It has a very large number of local optima. That is, there is a large number of 
arrangements of circuits with the property that any simple rearrangement in¬ 
creases the number of connections between chips. This means that any search 
process that simply goes uphill (like our model of genetic evolution) can end up 
at any one of a very large number of configurations. An unsophisticated opti¬ 
mizing scheme will improve the design only until it reaches one of the many 
local optima, which one depending upon starting conditions. For example, for 
the 370 design problem, several runs of a simple hill-climbing algorithm pro¬ 
duced between 677 and 730 interconnections. The best design found (using a 
more sophisticated algorithm) required only 183 connections. 

2. There is a smaller, although still substantial, number of arrangements with 
close to the optimal number of interconnections. That is, there are many qualita¬ 
tively different designs that have close to the best payoff. In the numerical 
example there are on the order of sqrt(5,000) es 70 such arrangements. 
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Figure 15.1 ( continued ) 


Kirkpatrick et al. (1983) show that two other computer design problems, the 
arrangement of chips on circuit boards and the routing of wiring among chips, have 
similar properties. These three computer design problems are not unlike evolu¬ 
tionary “design” problems in biology—the localization of functions in organs, the 
arrangement of organs in a body, and the routing of the nervous and circulatory 
networks—that are likely to generate complex adaptive topographies. Moreover, as 
anyone experienced with the numerical solution of real-world optimization prob¬ 
lems will testify, these results are quite typical. To quote from the introduction of 
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a recent textbook on optimization, “many common design problems, from re¬ 
servoirs to refrigerators, have multiple local optima, as well as false optima, that 
make conventional [meaning iterative hill-climbing] optimization schemes risky” 
[Wilde, 1978]. Thus, if the analogy is correct, small differences in initial conditions 
may launch different populations on different evolutionary trajectories, which end 
with qualitatively different equilibrium phenotypes. 

It is important to see that this history-generating property does not depend 
on the existence of genetic or developmental constraints. At least as defined in 
Maynard Smith et al. [1985] there are no genetic or developmental constraints in 
the simple model of selection acting on a complex topography. Every combi¬ 
nation of phenotypes can be achieved, and there is no bias in the production of 
genetic variation. Path dependence results from the facts that different char¬ 
acters interact in a complex way to generate fitness and that the direction of 
natural selection depends on the shape of the local topography. 

Of course, developmental constraints could also play a major role in con¬ 
fining lineages to historically determined bauplans, as many biologists have ar¬ 
gued [e.g., Seilacher, 1970]. Further, complex topographies and developmental 
constraints may be related. Wagner [1988] hypothesizes, based on a model of 
multivariate phenotypic evolution, that fitness functions will generally be “ma¬ 
lignant” and that developmental constraints act to make phenotypes more re¬ 
sponsive to selection. By malignant, Wagner means that the fitness of any one 
trait is likely to depend on the values of many other traits. For example, larger 
size may be favored by selection for success in contests for mates but only if 
many traits of the respiratory, skeletal, and circulatory systems change in concert 
to support larger size. If phenotype is unconstrained, response to selection will 
be slow because of the need to change so many independent characters at once, 
whereas developmental constraints confine the expression of variation to a few 
axes that can respond rapidly to selection. Thus, the bill is a simple, rather 
constrained part of the anatomy of birds, yet selection has remodeled bills along 
the relatively few dimensions available [length, width, depth, curvature] to 
support an amazing variety of specializations. Developmental constraints may 
be a solution to the complexity of adaptive topographies, albeit one that limits 
lineages to elaborating a small set of historically derived basic traits as they 
respond to new adaptive challenges. 

Path dependence can arise from the action of functional processes in a 
cultural system of inheritance as well. For example, decision-making forces arise 
when people modify culturally acquired beliefs in the attempt to satisfy some 
goal. If people within a culture share the same goal, this process will produce an 
evolutionary trajectory very similar to one produced by natural selection—the 
rate of change of the distribution of beliefs in a population will depend on the 
amount of cultural variation and the shape of an analog of the adaptive topog¬ 
raphy in which fitness is replaced by utility [the extent to which alternative 
beliefs satisfy the goal] [Boyd and Richerson, 1985, ch. 5], The details of the 
transmission and selective processes are not crucial, as long as the processes that 
lead to change can be represented as climbing a complex topography. 

It is unclear whether adding genetic constraints will increase or decrease the 
potential for path dependence. One sort of genetic constraint can be added by 
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allowing significant genetic correlations among characters (Lande, 1986]. This 
assumption means that some mutants are more probable than others. As long as 
there is some genetic variation in each dimension, the vector of phenotypic 
means will still go uphill but not necessarily in the steepest direction. The po¬ 
pulation will come to equilibrium at one of the local peaks, although this might 
be quite distant from the equilibrium that the population would have reached 
had there been no genetic correlations (Lande, 1979, 1986). More generally, 
most genetic architectures do not result in Gaussian distributions of genetic 
values (Turelli and Barton, 1990), and analyses of two locus models suggest that 
dynamics resulting from the combination of linkage and selection may create 
many locally stable equilibria even when the fitness function is unimodal (Karlin 
and Feldman, 1970). This suggests that adding more genetic realism would in¬ 
crease the potential for path dependence. On the other hand, computer scientists 
(Holland, 1975; Brady, 1985) have found that optimization algorithms closely 
modeled on multilocus selection are less likely to get stuck on local optima than 
simple iterative hill-climbing algorithms. The issue of genetic constraints is still 
open. 

The situation in cultural evolution is similar, even if not so well studied. On 
the one hand, many anthropologists stress the rich structure of culture. To the 
extent that such structure exists, path dependence is likely to be important. On 
the other hand, Bandura (1977), a pioneering student of the processes of social 
learning, argues that there is relatively little complex structuring of socially 
learned behavior. The many examples of cultural syncretism and diffusion of 
isolated elements of technology suggest his view ought to be taken seriously. 
Perhaps complex structure is most important in the symbolic aspects of culture, 
but symbolic variation may be only weakly constrained by functional con¬ 
siderations (Cohen, 1974). According to Cohen, we have to use purely contin¬ 
gent historical explanations for things such as linguistic variation, while simple 
functional explanations suffice for economic, political, and social-organizational 
phenomena. 

Long-term nonstationary trends in evolution can result if there is some 
process that causes populations to shift from one peak to another and if that 
process acts on a longer time scale than adaptive processes like natural selection. 
So far we have assumed that populations are large and the environment is un¬ 
changing. With these assumptions, populations will usually rapidly reach an 
adaptive peak and then stay there indefinitely. They will not exhibit the kind of 
long-run change that we have required for change to be historical. Wright (e.g., 
1977) long argued that drift plays an important role in causing populations to 
shift from peak to peak, and then competition among populations favors the 
population on the higher peak. Chance variations in gene frequency in small 
populations could lead to the occasional crossing of adaptive valleys and the 
movement to higher peaks. Recently, several authors have considered mathe¬ 
matical models of this process (Barton and Charlesworth, 1984; Newman, Co¬ 
hen, and Kipnis, 1985; Lande, 1986; Crow, Engels, and Denniston, 1990). These 
studies suggest that the probability that a shift to a new peak will occur during 
any time period is low; however, when a shift does occur, it occurs very rapidly. 
If this view is correct, drift should generate a long-run pattern of change in which 
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populations wander haltingly up the adaptive topography from lower local peaks 
to higher ones. It is also implausible that environments remain constant either in 
space or in time. As environments change, the shape of the adaptive topography 
shifts, causing peaks to merge, split, disappear, or temporary ridges to appear, 
connecting a lower peak to a higher one. Thus, populations will occasionally 
slide from one peak to another. As long as such events are not too common, 
environmental change will also lead to long-run change. Such change might 
appear gradual if there are many small valleys to cross or punctuational if there 
are a few big ones. 

Adding social or ecological realism to the basic adaptive hill-climbing model 
of evolution probably increases the potential for multiple stable equilibria. In the 
simple model, an individual’s fitness depended only on his phenotype. When 
there are social or ecological interactions among individuals within a population, 
individual fitness will depend on the composition of the population as a whole. 
When this is the case, evolutionary dynamics can no longer be represented in 
terms of an invariant adaptive topography. However, they may still be charac¬ 
terized by multiple stable equilibria. Moreover, the fact that many quite simple 
models of frequency dependence have this property suggests that frequency 
dependence may usually increase the potential for path-dependent historical 
change. 

Models of the evolution of norms provide an interesting example of how 
frequency dependence can multiply the number of stable equilibria. Hirshleifer 
and Rasmusen [1989] have analyzed a model in which a group of individuals 
interact over a period of time. During each interaction, individuals first have the 
opportunity to cooperate and thereby produce a benefit to the group as a whole 
but at some cost to themselves; they then have a chance to punish defectors at no 
cost to themselves. These authors show that strategies in which individuals co¬ 
operate, and punish noncooperators and nonpunishers, are stable in the game- 
theoretic sense. However, they also show that punishment strategies of this kind 
can stabilize any behavior—cooperation, noncooperation, wearing white socks, 
or anything else. We (chapter 9) show that the same conclusions apply in an 
evolutionary model even when punishment is costly. This form of social norm 
can stabilize virtually any form of behavior as long as the fitness cost of the 
behavior is small compared to the costs of being punished. 

More generally, coordination is an important aspect of several kinds of social 
interactions (Sugden, 1986). In a pure game of coordination, it does not matter 
what strategy is used, as long as it is the strategy that is locally common. Driving 
on the left versus right side of the road is an example. It does not matter which 
side we use, but it is critical that we agree on one side or the other. This property 
of arbitrary advantage to the common strategy is shared by many symbolic and 
communication systems and allows multiple equilibria whenever there are mul¬ 
tiple conceivable strategies. In many other common kinds of social interactions, 
elements of coordination and conflict are mixed. In such games, all individuals 
are better off if they use the same strategy, even though the relative advantages 
of using the strategy differ greatly from individual to individual, and some in¬ 
dividuals would be much better off if another strategy were common. As long as 
the coordination aspect of such interactions is strong enough, multiple stable 
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equilibria will exist. Arthur (1990) shows how locational decisions of industrial 
enterprises could give rise to historical patterns due to coordination effects. It is 
often advantageous for firms to locate near other firms in the same industry 
because specialized labor and suppliers have been attracted by preexisting firms. 
The chance decisions of the first few firms in an emerging industry can establish 
one as opposed to another area as the Silicon Valley of that industry. More 
generally, historical patterns can arise in the many situations where there are 
increasing returns to scale in the production of a given product or technology. 
Merely because the “qwerty” keyboard is common, it is sensible to adopt it 
despite its inefficiencies. 

Interactions between populations and societies (or elements within societies 
such as classes) can give rise to multiple stable equilibria. Models of the co¬ 
evolution of multiple populations have many of the same properties as fre¬ 
quency- and density-dependent selection within populations, although the 
theory is less well developed (Slatkin and Maynard Smith, 1979). The evolution 
of one population or society depends upon the properties of others that interact 
with it, and many different systems of adjusting the relationships between the 
populations may be possible. For example, Cody (1974:201) noted that com¬ 
peting birds replace each other along an altitudinal gradient in California but 
latitudinally in Chile. Given the rather similar environments of these two places, 
it is plausible that both systems of competitive replacement are stable and which 
one occurs is due to accidents of history. 

The stratification of human societies into privileged elites and disadvantaged 
commoners derives from the ability of elites to control high-quality resources 
or to exploit commoners using strategies that are similar to competitive and 
predatory strategies in nature. Insko et al. (1983) studied the evolution of social 
stratification in the social psychology laboratory. They showed that elites could 
arise in both an experimental condition that mimicked freely chosen trade re¬ 
lations and one that mimicked conquest. Elites were approximately as well off in 
both conditions and, insofar as they controlled things, would have no motivation 
to change social arrangements. It seems plausible that the diversity of political 
forms of complex societies could result from many arrangements of relations 
between constituent interest groups being locally stable. The distinctive differ¬ 
ences between the Japanese, American, and Scandinavian strategies for operating 
technologically advanced societies could well derive from historic differences in 
social organization that have led to different, stable arrangements between in¬ 
terest groups, in spite of similar revolutionary changes in production techniques 
of the last century or two. 

Social or ecological interactions may also give rise to dynamic processes that 
are sensitive to initial conditions and have no stable equilibria. Lande (1981) 
analyzed a model of one such process, sexual selection in which females have a 
heritable preference for mates that is based on a heritable, sex-limited male 
character. According to his model, when the male character and female pre¬ 
ferences are sufficiently correlated genetically, female choice can create a self¬ 
reinforcing "runaway” process that causes the mean male character and the mean 
female preference either to increase or decrease indefinitely, even in the presence 
of stabilizing selection on the male character. Selection cannot favor female 
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variants that choose fitter males (in the usual sense of fitter] because most females 
are choosing mates with an exaggerated character. The "sensible” female’s sons 
will be handicapped in the mating game. The direction that evolution takes 
depends on the details of the initial conditions in Lande’s model. His quantitative 
character will be elaborated in one direction or the other depending on how 
evolution drifts away from an unstable line of equilibria. Although the inter¬ 
pretation of this model is controversial, it is easy to imagine that the exaggerated 
characters of polygynous animals like birds of paradise and peacocks result from 
the runaway process. We (Boyd and Richerson, 1985, ch. 8, 1987; Richerson and 


Microevolutionary Processes 




Figure 15.2. Both parts show the trajectories of population growth generated by the 
same model of social evolution for two slightly different initial population sizes. 

In 15.2a the society goes through three distinct phases of growth, while in 15.2b, there are 
only two. 



MICROEVOLUTIONARY PROCESSES GIVE RISE TO HISTORY 303 


Boyd, 1989a) have argued that quite similar processes may arise in cultural 
evolution when individuals are predisposed to imitate some individuals on the 
basis of culturally heritable characteristics. The use of some character associated 
with prestige (stylish dress, for example) as an index of whom to imitate has the 
same potentially unstable runaway dynamic as Lande’s model of mate choice 
sexual selection, and even casual observation suggests that prestige systems do 
follow contingent historical trajectories. Fashions in clothing, for example, evolve 
in different directions in different societies, often without much regard for 
practicality. 

Perhaps the most clearly historical patterns of change result when social or 
ecological interaction leads to “chaotic” dynamics. For example, Day and Walter 
(1989) have analyzed an extremely interesting model of social evolution in 
which population growth leads to reduced productivity, social stratification, and 
eventually to a shift from one subsistence technology to a more productive one. 
The resulting trajectories of population size are shown in figure 15.2. Population 
grows, is limited by resource constraints, and eventually technical substitu¬ 
tion occurs, allowing population to grow once more. The only difference be¬ 
tween figure 15.2a and 15.2b is a very small difference in initial population size. 
Nonetheless, this seemingly insignificant difference leads to qualitatively differ¬ 
ent trajectories—one society shows three separate evolutionary stages, and the 
second only two. 


Conclusion 

Scientific and historical explanations are not alternatives. Contingent, diverging 
pathways of evolution and long-term secular trends can result from processes 
that differ only slightly from those that produce rapid, ahistorical convergence to 
universal equilibria. Late nineteenth- and early twentieth-century scientists gave 
up restricting the term “scientific” for deterministic, mechanistic explanations 
and began to admit “merely” statistical laws into the fundamental corpus of 
physics (very reluctantly in some cases—recall Einstein’s famous complaint 
about God not playing dice with the universe to express his distaste for quantum 
mechanics). Similarly, historical explanations cannot be distinguished from other 
kinds of scientific explanations except that some models (and, presumably, the 
phenomena they represent) generate trajectories that meet our definition of 
being historical. These history-generating processes do not depend on exotic 
forces or immaterial causes that ought to excite a scientist’s skepticism; perfectly 
mundane things will do. There are challenging complexities in historical pro¬ 
cesses. For example, even well-understood processes will not allow precise 
predictions of future behavior when change is historical. However, all the tools 
of conventional scientific methods can be brought to bear on them. For example, 
it should be possible to use measurement or experiment to determine if a pro¬ 
cess is in a region of parameter values where chaotic behavior is expected. At 
the same time, the historian’s traditional concern for critically dissecting 
the contingencies that contribute to each unique historical path is well taken. 
Process-oriented “scientific” analyses help us understand how history works, and 
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“historical” data are essential to test scientific hypotheses about how popula¬ 
tions and societies change. 

In the biological and social domains, "science” without "history” leaves 
many interesting phenomena unexplained, while "history” without “science” 
cannot produce an explanatory account of the past, only a listing of disconnected 
facts. The generalizing impulses of science require historical methods, because 
the phenomena to be understood are genuinely historical and because historical 
data are essential for developing generalizations about evolutionary processes. In 
return, generalizations derived from history and by the study of contemporary 
systems would seem to be essential for an understanding of particular cases. The 
amount of data available from the past is usually very limited, and the number of 
possible reconstructions of the past is correspondingly large. Some sort of theory 
has to be applied to make some sense of the isolated facts. Historians (e.g., 
Braudel, 1972) and paleontologists (e.g., Valentine, 1973) often cast their nets 
rather widely in search of help in interpreting the documents and fossils. McNeill 
(1986) advocates a "scientific,” generalization-seeking approach to history much 
in this spirit. Consider the question of which of the potential history-producing 
processes we have discussed are most important in explaining the changes in 
human societies over the last few tens of thousands of years. Generalizing dis¬ 
ciplines such as climatology and cultural ecology are certainly relevant to the task 
in general and to the understanding of how particular societies changed in par¬ 
ticular environments (Henry, 1989). At the same time, because these historical 
societies faced Pleistocene climates and the transition to the Holocene, and 
because they developed a series of technical, social, and ideological innovations 
that are the foundation of modern human societies by processes that are not 
open to direct observation, the historical and archaeological records provide 
crucial data not available from ahistorical study. To the extent that the processes 
we have described are important, "science” and “history” cannot be disen¬ 
tangled as separate intellectual enterprises. 

Darwinian models of organic and cultural evolution illustrate how little 
distinction can be made between the two approaches. Such models can produce 
historical patterns of change by a rather large number of different mechanisms. 
We have argued that historical change is distinguished by two attributes: the 
tendency of initially similar systems to diverge and the occurrence of long-term 
change. Evolutionary models, including those that assume that selection or 
analogous cultural processes increase adaptiveness in each generation, readily 
generate multiple stable equilibria. Populations with similar initial conditions 
may evolve toward separate equilibria. Random genetic drift and analogous cul¬ 
tural processes, coupled with environmental change, may cause populations to 
shift from one equilibrium to another. It is plausible that peak shifting by pop¬ 
ulations (or the shifting of peaks due to environmental change) occurs at a slow 
enough rate to explain long-term secular trends. 

Many anthropologists take as their task the explanation of differences among 
human societies and suggest that most such differences are historical in char¬ 
acter. If explanation of such variation is mainly historical, then anthropologists 
might reasonably ask, what is the point of Darwinian models of cultural change 
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when historical or “contextual” explanations will be much more productive? 
The reasons are as follows. 

First, the premise is often incorrect. Genuine convergences are common and 
explaining them requires some theory based on common processes of cultural 
change. Perhaps the most spectacular cultural example is the convergence of 
social organization in stratified, state-level societies in the Old and New Worlds. 
For example, Cortez in 1519 found that Aztec society was quite similar to his own 
in important ways: it contained familiar roles, hereditary nobility, priests, war¬ 
riors, craftsmen, peasants, and so on. The bureaucracy was organized hierar¬ 
chically. This convergence is remarkable because the Spanish and Aztec states 
evolved independently from a hunter-gatherer ancestry. The cultural lineages 
that resulted in these two states were without known cultural contact for several 
thousand years before state formation began in either (Wenke, 1980). 

Second, Darwinian models can make useful predictions. They can tell us 
why some forms of behavior or social organization are never observed and others 
are common. For example, kinship is an extremely common principle of social 
organization. Contrarily, there would seem to be lots of advantages to a free 
market in babies—for the individual, it would allow easy adjustment of family 
size, age composition, sex ratio, and so on, and for society, a division of labor in 
child rearing would allow better use of human resources. The sociobiological 
theory of kin selection explains why there are no societies with free trade in 
infants and why kinship is generally an important feature of social organization. 
If most of the historic context is taken as given, Darwinian arguments can be 
very powerful heuristics. This is especially clear for genetic evolution. For ex¬ 
ample, given haplodiploidy, a theory based directly on the expected equilibrium 
outcome of natural selection can make surprising and extremely fruitful pre¬ 
dictions about patterns of behavior in social insects. Who, for example, would 
have thought to connect sex ratio among reproductives and “slave making” in 
ant species? In recent years, similar ideas have been usefully applied to under¬ 
standing human behavior. For example, Hill, Kaplan, and their colleagues (re¬ 
viewed in Hill and Kaplan, 1988) have used theory from behavioral ecology to 
relate patterns of foraging, mate preference, and child care among Ache hunter- 
gatherers, and Borgerhoff-Mulder (1988) has explained variation in bride price 
among Kipsigis pastoralists in terms of parameters that predict future female 
fitness. 

Finally, it is useful in and of itself to know that even the most strongly 
functional Darwinian models can give rise to historical change. The same pro¬ 
cesses that give rise to convergence in one case can generate differences in an¬ 
other, given only small changes in the structure of the process or in initial 
conditions. Brandon (1990) argues that “why possibly” explanations are useful 
in evolutionary biology. By this, he means explanations that tell us how some 
character could have evolved are useful even if we cannot determine whether the 
explanation is true. The theoretical models in population genetics provide a good 
example: Hamilton’s (1964) kin selection models show how natural selection 
could give rise to self-sacrificial behavior. However, we usually do not know 
whether any particular case of altruism arose as a result of kin selection. The lack 
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of any “why possibly” explanation would cast doubt on other aspects of our 
knowledge of how selection shapes behavior. 

Understanding how adaptive processes could give rise to historical change 
is useful for analogous reasons. There is considerable evidence that people’s 
choices about what to believe and what to value are affected by the consequences 
in material well-being, social status, and so on (e.g., Boyd and Richerson, 1985). 
This view has a venerable history in anthropology (e.g., Barth, 1981; Harris, 
1979), plays a foundation role in economics, and is taken for granted in many 
historians’ explanations for particular sequences of events. If cultural change is 
affected by consequence-driven individual choice or natural selection, then it 
follows that there will be a process that will act to modify the distribution of 
cultural variation in a population in much the same way that natural selection 
changes genetic variation (Boyd and Richerson, 1985, chs. 4 and 5). The fact that 
functional processes like natural selection readily lead to history allows one to hold 
this view without having necessarily to search for external environmental dif¬ 
ferences to explain the differences among apparently similarly situated human 
societies. 


notes 

We thank James Griesemer, Matthew H. Nitecki, Eric A. Smith, and two anony¬ 
mous reviewers for most helpful comments on previous drafts of this chapter. 

1. This project is quite different from the better-known, classical studies of 
cultural evolution developed by Leslie White (1959) and other scholars in anthro¬ 
pology. This work focused descriptively on the large-scale patterns of cultural evo¬ 
lution rather than on the details of the processes by which cultural evolution occurs 
(Campbell, 1965, 1975). The research tradition White represents derives from the 
progressivist ideas of Herbert Spencer, rather than from Darwin. 

2. The additive genetic value of a particular individual for a particular character 
is the average value of that character for offspring produced when that individual 
mates at random with a large number of other individuals in the population. For 
example, the additive genetic value of a bull for fat content is the average fat content 
of all its offspring where mates were chosen at random. The distribution of genetic 
values is Gaussian when the probability that an individual has a given genetic value is 
given by the normal (or Gaussian) probability distribution. Genetic correlations exist 
when the distributions of genetic values for different characters are not probabilis¬ 
tically independent. For example, if bulls whose genetic value for size also tend to 
have a higher genetic value for fat content, then body size and fat content are ge¬ 
netically correlated. Genotype environment correlations arise when individuals with 
the same genotype develop different phenotypes in different environments. 
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1 6 Are Cultural Phylogenies 
Possible? 

With Monique Borgerhoff Mulder and 
William H. Durham 


Biology and the social sciences share an interest in phylogeny. 
Biologists know that living species are descended from past species and use the 
pattern of similarities among living species to reconstruct the history of phylo¬ 
genetic branching. Social scientists know that the beliefs, values, practices, and 
artifacts that characterize contemporary societies are descended from past so¬ 
cieties, and some social science disciplines (e.g., linguistics and cross-cultural 
anthropology] have made use of observed similarities to reconstruct cultural 
histories. Darwin appreciated that his theory of descent, with modification, had 
many similarities of pattern and process to the already well-developed field of 
historical linguistics. In many other areas of social science, however, phyloge¬ 
netic reconstruction has not played a central role. 

Phylogenetic reconstruction plays three important roles in biology. First, it 
provides the basis for the classification. Entities descended from a common 
ancestor share novel, or derived, characters inherited from that ancestor. There¬ 
fore, it is possible to group them into hierarchically organized series of groups— 
species, genus, family, order, and so on in the biological case. 

Second, knowledge of phylogeny often allows inferences about history. 
The knowledge that humans are more closely related to chimpanzees and gorillas 
than to orangutans provides evidence that the human lineage arose in Africa. 
Phylogenetic reconstructions based on the characters of extant species or cul¬ 
tures often allow us to reconstruct the history in the absence of a historical, ar¬ 
chaeological, or fossil record. In practice, the history of many biological 
and cultural groups is so poorly known that only by combining phylogenetic 
and historical or archaeological information can reliable reconstructions be 
obtained. 
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Third, entities descended from a common ancestor share features that may 
constrain the pathways that more recent evolution has followed. For example, se¬ 
lection for terrestrial locomotion may lead to quadrupedal locomotion in a small 
monkey that runs along the tops of branches but to bipedal locomotion in a large 
arboreal ape that swings below branches (Foley, 1987). The latter pattern allows the 
hand to specialize in manipulative tasks and, on many accounts, is why the ape, but 
not the monkey lineage, eventually was able to produce a cultural species. 

The importance of descent is the crux of some of the deepest controversies 
of all the historical sciences. Some social scientists and biologists (e.g., Boyd and 
Richerson, 1992; Hallpike, 1986; Sahlins, 1976) have argued that history strongly 
constrains adaptation and, as a result, strictly limits adaptive interpretations of 
current behavior. As Francis Galton taught both biologists and social scientists in 
the nineteenth century, to account for the effects of common ancestry, the study 
of adaptation or function requires that patterns of descent be known. Our in¬ 
ability to provide appropriate roles for history and function is a chronic source of 
controversy. 

If the analogy is real, an interdisciplinary exchange of concepts and tools 
could pay great dividends. Social scientists may be particularly interested in the 
near-revolutionary developments in systematics (Ridley, 1986) and compara¬ 
tive methods (Harvey and Pagel, 1991) developed by evolutionary biologists in 
the last two decades. 

The purpose of this chapter is to examine the role of descent in culture 
evolution theory. We believe that the critical question is whether human cul¬ 
tures, or parts of them, are isolated from one another to the same degree as 
biological entities like species and genes. Cultures are frequently characterized 
by sharp ingroup-outgroup boundaries (LeVine and Campbell, 1972) that may 
function to limit the flow of ideas from one population to another (Boyd and 
Richerson, 1987). However, there are also many examples of the diffusion of 
cultural traits across such boundaries (Rogers, 1983). Are the isolating processes 
sufficiently strong to provide at least a core of important cultural traits that are 
sufficiently protected from diffusion so that phylogenetic analysis is possible? 
If so, concepts and methods from biological systematics can be used to reconstruct 
the history of cultures. If not, human cultures are more like subspecies or local 
populations linked by gene flow than like reproductively isolated species. In this 
case, it may be useful to make separate phylogenies for each subunit of culture 
that is substantially protected from diffusion, in much the same way that modern 
molecular procedures are used to reconstruct the phylogeny of subgenomic 
units, especially individual genes. It may also be that there are no cultural units 
with sufficient coherence and therefore that phylogenetic methods are useless. 

We begin by reviewing the notions of descent used in evolutionary biology. 
Biologists have been making use of the concept of descent ever since Darwin, and 
they have developed a sophisticated appreciation for the concept and its problems 
that may be helpful in the human case. The complexity and diversity of biological 
systems of inheritance is wondrous to those brought up on the simple Mendelism 
of 20 years ago (Falk and Jablonka, 1997). Although it is likely that the process 
of cultural descent with modification is different from the analogous process in 
organic evolution, we believe that much can be learned from biologists’ century 
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of hard work. We then consider data from the social sciences that indicate the 
extent to which cultures form bounded wholes, analogous to species. Finally, we 
consider how the descent concepts, partly borrowed from biology, might be used 
to tackle important questions in the social sciences. 


Descent in Organic Evolution 

In biology, two different entities exhibit the clear patterns of descent with 
modification. The most familiar example is the species. The collection of in¬ 
dividuals who make up a species during any generation is descended, and per¬ 
haps slightly modified, from the collection of individuals who made up the 
species during the previous generation. A new species is formed by the splitting 
of an existing species. Then each of the daughter species is descended from the 
single ancestral species that gave rise to them. 

Much the same holds for genes one by one. Because genes result from the 
copying of DNA, every gene is descended from the gene that provided its tem¬ 
plate. Modified genes arise from existing genes by mutation, recombination, and 
gene conversion at a given locus. A genetic locus can give rise to another locus 
by duplicating itself on the chromosome, after which the daughter locus begins 
independent evolution. The relationships among genes is not simply the re¬ 
lationships among the species that carry them (although this is often the case}. 
We can keep track of the relationship of genes within a single species (e.g., various 
forms of hemoglobin within human populations}. It is also possible to speak of 
relationships among genes that are inconsistent with relationships among species. 
For example, genes for globin molecules in vertebrates and certain plants seem to 
share a more recent common ancestor than the genes in vertebrates and ar¬ 
thropods, as surprising as this seems at first blush (Jeffreys et al., 1983}. 

Descent relationships are often represented using branching diagrams like 
that shown in figure 16.1. The diagram conveys the idea that both A and B are 
descended from an ancestor C. (Systematists use similar branching diagrams 
called cladograms to represent patterns of similarity without reference to time, 
or ancestor-descendant relationships; statistical clustering algorithms create 
treelike dendrograms also without any pretense to representing ancestor- 
descendant relationships. Tree diagrams are used here to represent phylogeny.} 
The same diagram is used to represent the relationship among different kinds of 
things. For biologists A, B, and C may represent species or genes. Social scientists 
use similar diagrams to express the relationship among languages, or other as¬ 
pects of culture, often with the explicit intention of representing a phylogeny. 
What, if anything, do the descents of genes and species have in common? Can 
these commonalities provide some help in analyzing the descent of cultures, 
languages, and technologies? 

The Descent of Genes 

To answer this question, let us begin with the simpler case—the descent rela¬ 
tionship among genes. If we ignore for a moment the possibility of recombination, 
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A B 



C 


Figure 16 . 1 . A hypothetical 
phylogeny in which species A and B 
are descended from species C. 


every gene is a copy of another gene. Of course, that gene was the copy of yet 
another gene, and so on. Thus, if we pick any two genes, A and B, we can, in 
principle, trace back through a series of copies until we find a gene, C, that 
served as a template for both. We say that genes A and B are descended from C. 
If mutations have occurred, A or B may be different from C and each other. As 
long as mutations are rare and the gene includes enough bases, then genes that 
share more derived mutations are more likely to be related. Taxonomists use this 
fact to reconstruct the branching pattern among genes sampled from living 
species. Notice that there is nothing in the discussion that specifies that C, A, 
and B have to belong to the same (or different) species. The same argument 
would hold regardless of whether A and B are genes found within a single species 
or among distantly related species (e.g., humans and bean plants). 

Units with Reticulated Phylogenies 

Recombination—the shuffling of chromosomes of the genes along a chromo¬ 
some and the sequence within a gene—complicates matters because it leads to 
what cladists call reticulated phylogenies. Figure 16.2 shows the lineages of three 
genes. Recombination has occurred within the gene three times. After each 
recombination event, each of the daughter genes is a copy of part of each of the 
two parents. The daughter genes are no longer descended from the parental 
genes in the same way that they were in the absence of recombination. They are 
no longer almost exact copies of the parents; rather, they are partial copies of 
both parents. Further recombination events create yet more complicated pat¬ 
terns of relationship. After some time, every copy of the gene is related to a large 
number of other genes in some complicated way that utterly obscures descent. 
Recombination within a gene is rare, but recombination within chromosomes 
between different genes is quite common. Deep phylogenies can be recon¬ 
structed for genes, but only shallow ones for chromosomes. 
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AiCDiPCii ABCDEFG ACCDEFG 



ABCDEFG AliCDEPCii AECDEfG 


Figure 16.2. Recombination leads to complicated patterns of descent. Each string of 
letters represents a segment of the chromosome. Each generation each gene is 
replicated, sometimes with recombination. After four generations, each chromosome is 
partly descended from all three of the original chromosomes. 


Gene flow (migration) among subpopulations of a species has a similar 
effect. Any given local group will have acquired genes from many different local 
groups in the past. Even if most subpopulations are created by the subdivision of 
a single parental population, a relatively small rate of individual-level migration 
between subpopulations will carry genes evolved in one daughter subpopulation 
to its sisters. Fairly shortly, descent at the subpopulation level will be impossible 
to detect. Thus, there is a large range of genetic units ranging in size from 
roughly small chromosome segments to the subspecies for which phylogenetic 
analysis is usually impossible. 

Some large gene collections, such as mitochondrial genomes, are protected 
from recombination because they are transmitted asexually. Mitochondrial 
phylogenies of some depth can be constructed, although they illustrate another 
process that eliminates phylogenetic information in the long run. Mitochondria 
are subject to high mutation rates. In a matter of a few million years, every 
descendant pair of mitochondrial genes will have independently mutated more 
than once, and the traces of descent will be lost. Conservative genes like the 
cytochrome genes have slow rates of evolution and can be used to reconstruct 
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phylogenetic relationships reaching back to near the origin of life, but these are 
exceptional. More typically, deep phylogenetic reconstructions based on less 
faithful structures are quite controversial even when we can be almost certain 
that recombination and migration have not confused the picture. 

The Descent of Species 

Species and higher taxa are the classic focus of phylogenetic analysis in biology. 
Linnean systematists formalized the common observation that the organic world 
comes in readily observable clusters. Species and higher taxa seem to be sepa¬ 
rated by distinctive gaps that do not occur within species or among many other 
natural objects. Darwin’s theory of descent with modification gave a theoretical 
underpinning to the trees of relationships that Linnaeus had enshrined in a 
hierarchical classification system, although Darwin had little to say about the 
species-isolating mechanisms that enforce the gaps between species. His fol¬ 
lowers have made up for this deficiency; the issue of speciation is a major topic in 
modern evolutionary biology. 

In the basic picture constructed by architects (e.g., like Ernst Mayr) of the 
midcentury neo-Darwinian synthesis, species are created when a barrier to gene 
flow evolves to isolate two sets of populations. Once isolated, the evolution of 
the two new species is independent, and slowly changes accumulate due to 
natural selection, genetic drift, mutation, and so forth. There may be some 
evolutionary differentiation within a population due to selection or drift. But 
interbreeding among populations unites a species, whereas absolute speciating 
barriers definitively separate them from other species. Over the long run, species 
become different enough to be classified as new genera, families, orders, and so 
on, up Linnaeus's hierarchy. In the classic picture, complete isolation and the 
slow accumulation of differences allow for the reconstruction of relationships of 
descent by splitting over great time depths. 

The basic picture provides a clear causal explanation of the temporal and 
spatial coherence of species. Advocates of the biological species concept hold 
that only when this picture applies do we have species, properly speaking. 
However, several lines of evidence suggest that the absence of gene flow is 
neither necessary nor sufficient for the existence of coherent species in the sense 
of lumpy entities that show clear evidence of descent. Species can maintain their 
coherence without gene flow within the species, and species boundaries may be 
maintained despite gene flow between species. 

Some species have maintained species-typical phenotypes, including the 
ability to form fertile hybrids despite long periods without any gene flow. For 
example, the checkerspot butterfly is found in scattered populations throughout 
California. Members of different populations are very similar morphologically 
and are all classified as members of the species Euphydryas editha. However, 
careful study has shown that there is virtually no gene flow among widely sep¬ 
arated populations (Ehrlich and Raven, 1969). There are also many examples 
(Levinton, 1988) of cryptic "sibling” species that are long isolated but have 
evolved no detectable morphological differences. Some taxonomists claim that 
it is no more difficult to detect species in asexual organisms than it is in sexual 
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organisms (e.g., Mishler and Brandon, 1987), despite the fact that there is no 
gene flow to unite asexual populations. 

Some species persist despite substantial gene flow (Barton and Hewitt, 
1989). A hybrid zone can exist between what seem to be good species, and often 
a few genes have clearly leaked across the boundary from one species to another. 
It would seem as if such species must either be formerly geographically isolated 
subspecies that will hybridize away or incipient species that will eventually 
evolve an isolating barrier. In fact, active hybrid zones between rather distinct 
species sometimes persist for long periods of time. Selection can apparently 
maintain the coherence of species both without any help from gene flow and in 
the face of substantial amounts of it. 

Things are not always so neat. In bacteria, genes are frequently transmitted 
horizontally among lineages (Eberhardt, 1990). Bacterial DNA exists in two dis¬ 
tinct forms: most of the DNA is contained in a large chromosome, but about 
1 percent is contained in small loops of DNA called plasmids. The two forms of 
DNA are transmitted differently. For the most part, the chromosomal DNA is 
transmitted vertically. When bacteria divide, the chromosomal DNA is duplicated, 
and each daughter cell contains a copy. In contrast, plasmid DNA is transmitted 
horizontally from one bacteria to another during conjugation. Moreover, bacteria 
that are classified as belonging to different genera or families according to their 
chromosomal DNA readily conjugate and exchange plasmid DNA. As a result, 
genes carried on plasmids may jump from one lineage to another quite distant one. 
It is not certain that the two types of DNA are completely separate. Sometimes 
plasmid DNA may be incorporated into the chromosome, although if this occurs it 
is probably quite rare (Eberhardt, 1990). In the case of bacteria, there are really 
two sets of phylogenies: one for the chromosomal DNA and one for the mito¬ 
chondrial. Relationships between these phylogenies break down rapidly because of 
the horizontal transmission of plasmids across chromosomal lineages. 

The opposite situation occurs with the lineages of hosts and parasites and 
predators in many animals and plants. For example, ectoparasites like lice and 
fleas are often isolated within their hosts, so that host and parasite phylogenies 
are similar despite no transfer between host and parasite genomes. 


The Common Properties of Genes and Species 

Genes and species are units at quite different levels of organization. For them, but 
not units between them on the scale of organization, deep phylogenies can usually 
be constructed. The reason is a pair of similarities. First, both units are replicated 
with great fidelity and change slowly due to ongoing evolution. Second, when 
daughter genes and species change, these changes are not effectively shared with 
sister lineages by mixing or any other form of communication. For systems with 
high rates of change, like mitochondrial genomes, deeper descent is obscured 
because recently evolved differences completely obliterate the ancient similarities 
that are necessary to detect descent. In the case of units like chromosomes and 
local populations with high rates of mixing, descent is generally untraceable be¬ 
cause descent-derived differences are erased as rapidly as they arise. 
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Genealogy is by itself not enough to generate much descent. There is a 
hierarchy of genealogical entities in biology: genes, chromosomes, individuals, 
populations, species, and communities. These are genealogical entities because 
they are all descendants of other entities at the same level. In the face of rapid 
mixing or evolution (or both), genealogy alone cannot preserve detectable pat¬ 
terns of descent, at least not for long. Note that patterns of descent are a matter 
of timescale. If we are interested in relationships over only a few splittings of 
daughter entities, these may be detectable in the face of considerable mixing and 
high rates of evolution. If we want to know relationships traceable many splits 
ago, the criteria are more demanding. 


Reconstructing Cultural Phylogenies 

Can we apply these ideas from biology to the analysis of human culture? Dar¬ 
winian models of cultural evolution hold that culture is information transmitted 
from individual to individual by imitation, teaching, and other forms of social 
learning. Various processes cause the pool of cultural variants that characterize a 
population’s change through time. 

This view of culture and cultural evolution implies the existence of a hi¬ 
erarchy of genealogical entities analogous to the genealogical hierarchy of organic 
evolution. We do not know what is the smallest unit of cultural inheritance 
because we do not know in detail how culture is stored in brains. Nevertheless, 
scholars have proposed histories of quite small elements: particular words, 
particular innovations, elements of folk stories, and components of ritual prac¬ 
tice. Such small elements are linked together in larger, culturally transmitted 
entities: systems of morphology, myth, technology, and religion. Such medium- 
scale units are collected together into “subcultures” and “cultures” that char¬ 
acterize human groups of different scales: kin group, village, ethnic group, na¬ 
tion, and so forth. Cultural subunits sometimes crosscut one another in complex 
ways, as when religion or occupation crosscuts ethnicity (much like bacterial 
chromosomes and plasmids). 

Four Hypotheses 

Reconstructing cultural phylogenies is possible to the extent that there are ge¬ 
nealogical entities that have sufficient coherence, relative to the amount of 
mixing and independent evolution among entities, to create recognizable history. 
There is a continuum of possible views about what units in the hierarchy of 
cultural descent satisfy these desiderata. It is useful to identify four regions along 
this continuum. 


Cultures as Species 

Cultures are isolated from one another or are tightly integrated. They contain 
within them powerful sources of isolation (ethnocentric discrimination against 
strangers) or coherence (such as organizing systems of thought that act as biases 
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against ideas one by one, rather than strangers as whole individuals). Both 
mechanisms could cause cultures to act as single entities or “individuals” in the 
course of cultural evolution (see e.g., Marks and Staski, 1988). By one mecha¬ 
nism or another, there is little cross-cultural borrowing of any significance. New 
cultures are formed completely by the fissioning of populations and subsequent 
divergence. In this case, whole cultures are analogous to species or mitochondrial 
genomes. Biological methods of systematics can be applied almost intact, and 
deep cultural phylogenies are relatively easy to infer for at least the bulk of a 
people’s culture. 


Cultures with Hierarchically Integrated Systems 

Although cross-cultural borrowing may be frequent for many peripheral com¬ 
ponents, a conservative “core tradition” in each culture is rarely affected by 
diffusion from other groups. New core traditions mainly arise by the fissioning 
of populations and subsequent divergence of daughter cultures. Isolation and 
integration protect the core from the effects of diffusion, although peripheral 
elements are much more heavily subject to cross-cultural borrowing. In this case, 
core traditions are analogous to the bacterial chromosomes and the peripheral 
components to plasmids. Biological methods of systematics can be modified to 
deal with cross-cultural borrowing. Reasonably deep core-cultural phylogenies 
can still be inferred, but this requires disentangling the effects of borrowing by 
distinguishing core and peripheral elements, and especially by methods to iden¬ 
tify elements that “introgressed” into the core. 


Cultures as Assemblages of Many Coherent Units 

Cultures could be quite ephemeral assemblages of small units, but the latter may 
have limited mixing and slow evolution. Culture may have no species, but it 
might have genes, plasmids, and mitochondria. Different domains may have 
different patterns of inheritance and different evolutionary histories. The com¬ 
ponents may be fairly large, plasmid, or mitochondrion-like, such as language, or 
small, solitary memes, such as the idea of using a magnetized needle to point 
north. Any given culture is an assemblage of many such units acquired from 
diverse sources. Methods of phylogeny can be applied independently to each 
domain. The essential problem is to determine the boundaries of the domains 
and establish that they are stable in time and space. 


Cultures as Collections of Ephemeral Entities 

There are no observable units of culture that are sufficiently coherent for phy¬ 
logeny reconstruction to be useful. Observable aspects of culture could be the 
result of units that are beneath the resolution of current methods to observe. The 
forms of Acheulean hand axes are so similar that they cannot be used to infer 
anything about descent among their makers. Perhaps there were really many 
traditional ways to reach this apparently uniform end result. If we knew the 
details, we could reconstruct cultural phylogenies of hand ax making. There may 
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be observable differences, but if they are the product of many recombining 
elements that cannot be observed, there is no information that would allow us 
to construct a phylogeny of the bits. Alternatively, if cultural evolution is suffi¬ 
ciently rapid, behavior may reflect such recent history that all phylogeny is lost. 
The “jukebox” culture, in which cultures are rapidly modeled and remodeled to 
serve current adaptive purposes, would have this effect due to functional con¬ 
vergence rapidly destroying any trace of history. 

There are two issues at stake. First, when using the term descent, what do we 
mean? Proponents of the view that whole cultures are like species use descent 
to describe cultural replication of complex coherent groups by the mechanism 
of group fission or budding, whereas those who believe that only components of 
culture cohere would use descent to describe ancestor-descendant relationships 
resulting from any pattern of culture preserving the footprints of its history. We 
shall try to be clear in our own usages, but this is a merely terminological issue to 
which we devote no further space. Second, what is the world like? This is a much 
more interesting question, to which we devote the rest of the chapter. At one end 
of the continuum, all of the elements that make up a culture cohere and resist 
recombination. Cultures as a whole are analogous to species. At the other end, 
the observed elements of culture are the result of memes diffused or invented on a 
timescale too short for phylogenetic reconstruction. What is culture really like? 

Mechanisms 

Several general mechanisms might cause longevity and coherence in cultural 
units so that descent can be determined. 


Longevity of Historical Traces 

As in the case of genes, the phylogenetic process of cultural transmission provides 
some level of historical continuity. As with genes, the deepest phylogenies are 
possible when culture changes slowly and is not subject to functional conver¬ 
gence. Slow evolution will occur when people either cannot, or have no reason to, 
invent new forms. Surprisingly simple bits of culture are often apparently too 
obscure to reinvent, and all known modern exemplars derive from a single in¬ 
vention. Needham (1988) gave many plausible examples of Chinese technology 
that subsequently diffused to the rest of Eurasia (e.g., the magnetic compass). 
Nonetheless, in the long run, functional convergence seems to be the rule for 
technology. A long tradition in the social sciences, including the classic cultural 
ecology of Steward (1955) and modern evolutionary anthropology, it trades on 
the reality of substantial convergent evolution in human cultures. As in the bio¬ 
logical case, the best elements for historical analysis are those that are functionally 
arbitrary and symbolic. Language and other symbolically meaningful, but non¬ 
functional, variations are often used as indices of descent, much as functionally 
neutral flower form is used in plant systematics. Flowers are a plant's way of 
communicating with pollinators, so the analogy with language is real. 

The next subsection describes some mechanisms that may prevent mixing 
between coherent elements. Similar mechanisms may act to slow the rate of 
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evolution if internal innovations or innovators are perceived as strange, either 
because of a poor internal lit or because they arouse suspicions of heresy or 
deviance on the part of innovators. 


Processes That Give Rise to Coherence 

What general processes could give cultural elements an enduring coherence, 
leaving aside the size of cohering units and their relation to one another? In the 
symbolic and interpretive anthropology literature, the “glue” has been attrib¬ 
uted to the “meaning” that inheres in culture. Meaningful cultural information 
provides a convincing and compelling Weltanschauung for its bearers. Mean¬ 
ingful components help organize and make sense of other parts of the cultural 
system and natural world. They also legitimize and justify the system in the 
minds of its bearers. For this reason, meaningful components have variously been 
called “root paradigms” (Turner, 1977), “ultimate sacred postulates” (Rappa- 
port, 1979), “core principles” (Hallpike, 1986), and the like. Because it is 
critically important to a people’s understanding of the world and its place within 
it, they often have a special, even sacred, status. The notion of meaning is often 
linked to the idea of cultural holism. There is no logical reason for this limitation, 
and the idea may apply to cores or much smaller units. Subcultural units as small 
as the individual social scientific disciplines, street gangs, and clans often appear 
to have well-articulated systems of meaning. 

The special status of meaningful elements could provide coherence in sev¬ 
eral ways. First, the internal logic of a coherent block of culture may discriminate 
against intrusive elements. Diffused elements may be known to individuals, but 
the mismatch of meanings between whole cultures or subcultures entails that 
“foreign” values and ideas be misunderstood, disliked, and neglected. The 
mismatch may be between foreign elements but also between domains within a 
single culture (e.g., gender marked identities or even sets of subsistence skills). 

Second, meaningful culture often involves markers of group identity that are 
especially salient to the definitions of ingroup and outgroup. Contexts where co¬ 
herent units of meaning-rich culture are available for acquisition from foreigners 
are likely to involve marked ritual observances or ceremonies that mobilize 
ethnocentric sentiments more thoroughly than mundane contacts like trade, in 
which symbolically less marked elements may diffuse readily. Ethnocentrism can 
provide an effective isolating barrier to diffusion of cultural elements in theory 
and apparently in practice (Boyd and Richerson, 1987) at the whole-culture level. 
Class, caste, gender, occupation, and even hobby groups are symbolically marked 
within some societies. Within bounded groups, however large they may be, in¬ 
termarriage, diffusion, and other mixing processes create cultural uniformity, but 
there are sharp differences among them. This is a form of indirect bias. 

Third, to the extent that what coheres in culture is a symbolic system of 
organizing meanings, rather than the meanings themselves, it is protected from 
ordinary adaptive evolutionary pressures. In language at least, the symbol system 
is so rich and flexible that quite novel new meanings can be coded with the 
existing system; only linguistically trivial changes in lexicons were needed to 
adapt modern languages to the industrial revolution. 
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Finally, elements may cohere because certain combinations are adaptive and 
favored by natural selection or derivative adaptive decision-making rules. 
Adaptive forces may simply discriminate so strongly against recombinants that 
coherence is maintained despite massive mixing, as seems to be the case in 
certain hybrid boundaries in the biological case (Barton and Hewitt, 1989). A 
related sort of selective “glue” could come from the multiplicity of evolution- 
arily stable strategies that seem to exist in social systems. Perhaps the stability of 
coherent features comes from the failure of new or foreign social practice to fit 
into actual arrangements, rather than from inconsistencies at the cognitive/ 
affective meaning level. The symbolic or ideological level may follow the social, 
rather than dictate it. 

Rushforth and Chisholm (1991) gave a possible example in their discussion 
of Athapaskan “structures of communicative social interaction.” According to 
these investigators, a core “framework of meaning and moral responsibility” has 
persisted among Bearlake Athapaskan of northern Canada with "extraordinarily 
little change” across many generations and hundreds of years (p. 64). Moreover, 
remarkably similar beliefs and values—urging industriousness, generosity, au¬ 
tonomy, and restraint—have been documented among more than 30 other 
Athapaskan-speaking peoples across three geographically discontinuous clusters 
in Canada and Alaska, the Pacific Northwest, and the American Southwest. 

A deeply rooted family of social norms such as these might directly underpin 
social institutions. The norms that underpin social interactions are good candi¬ 
dates to be maintained as a coherent block because they are part of local evo- 
lutionarily stable strategies. In game theory, at least, it is easy to imagine locally 
and evolutionarily stable strategies for complex social institutions that are im¬ 
possible to change at the margin by either diffusion or within-lineage change 
because small movements away from current practice are disadvantageous. 

Would the multiple evolutionarily stable strategies (ESS) explanation ac¬ 
count for the remarkable cultural persistence of Athapaskan norms? Focusing on 
the Bearlake version, Rushforth and Chisholm (1991) suggested that “the 
Bearlake interpretive scheme has persisted because of the historically stable 
composition of the [social interaction] strategies it informs” (p. 119). They 
argued that Bearlakers pursue goals in daily life that are defined and valued by 
their interpretive framework of beliefs and values. The interactions that follow 
generate regular rewards or “payoffs” that encourage individuals to convey 
certain intentions to others. But the actions that convey these intentions are 
precisely those defined by the framework. In short, the framework persists as “an 
unintended consequence of the strategic behavior of individuals operating in 
their own interests” (p. 121). 

Sometimes coherent traditions are “acquired” by imposition by an invading, 
dominant culture, or assimilation to an attractive one. Even in this case, little 
admixture from the competing coherent structure of the adopting culture need 
result from its transfer from one biological population to another, as in the im¬ 
position of a common Greco-Roman urban civilization on a host of "barbarian” 
peoples in ancient Europe and Western Asia. Note that individual people can 
move readily without disturbing the integrity of the coherent elements, as the 
assimilation of many immigrant people to at least aspects of Anglo-American 
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culture over the last two centuries testifies. Nevertheless, replication by transfer 
to a new biological population is arguably normally accompanied by much mixing 
of old and new, and the fission of one population into two daughters probably 
conserves coherence more effectively. Similarly, high rates of immigration need 
not necessarily result in high rates of erosion of coherence, but cultural diffusion 
does seem likely to be stimulated by immigration in typical cases. 


Evidence 

The Descent of Cultures as Wholes 

Commentators such as Marks and Staski (1988) sometimes imply that they 
defend this position. According to McNeill (1986), historians such as Toynbee 
imply a position as extreme as this end of our continuum, although without any 
specific defense. McNeill’s own magisterial Rise of the West was written to 
demonstrate how it was not possible to write a world history without ac¬ 
knowledging the exchange of ideas among major culture areas, much less within 
them. Holistic arguments, ultimately deriving from Wittgensteinian philosophy, 
once enjoyed great appeal in history and many branches of the social sciences, 
and echoes remain. For example, in linguistics, de Saussure (1959) is often cited 
as a proponent of extreme systemicity in language, and even today some linguists 
espouse this view (Wardhaugh, 1992). The limitations of such arguments have 
long been recognized by philosophers, and more recently by social scientists. 
There is such overwhelming evidence for substantial diffusion and rapid evolu¬ 
tion in many components of culture that it is unlikely that any tenable empirical 
defense of a completely holistic cultures-as-species position can be offered. 


The Descent of Core Traditions 

The hierarchical hypothesis of large-scale cultural coherence rooted in a core 
tradition is a point along the continuum that warrants closer examination. Like the 
previous hypothesis it assumes that culture is an ideational system (i.e., it con¬ 
sists of widely shared ideas, values, and beliefs that shape behavior in local 
human populations; the named cultures of anthropologists). In this model, 
cultures are viewed as hierarchically integrated systems, each with its own in¬ 
ternal gradient of coherence. At one extreme in the gradient are the “core” 
components of a culture—those ideational phenomena that constitute its basic 
conceptual and interpretive framework and influence many aspects of social life. 
At the other are peripheral elements that change rapidly or are widely shared 
by diffusion. On this hypothesis, the processes of coherence generate one main, 
central core unit. But this central unit does not equally organize all elements of 
culture. There may be many other smaller elements that are only lightly or not at 
all influenced by the core. 

Core versus Periphery. Regardless of whether the core gets its coherence from 
meaning, protection, diffusion, structured social interaction, or from all these 
sources, the key assertion of this model is that core components exhibit 
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a remarkable resilience in the course of cultural history. The core "sticks to¬ 
gether” as a cohesive bundle even through repeated episodes of culture birth, 
giving rise to a set of descendant branches that then share the same “tradition.” 
As Vansina (1990) argued based on his case study, such traditions are based upon 

the fundamental continuity of a concrete set of basic cognitive patterns 
and concepts.... [The] continuity concerns basic choices which, once 
made, are never again put into question.... These fundamental acqui¬ 
sitions then act as a touchstone for proposed innovations, whether from 
within or without. The tradition accepts, rejects, or molds borrowings 
to fit. It transforms even its dominant institutions while leaving its 
principles unquestioned, (p. 258) 

Despite these numerous sources of cohesion, the hierarchical hypothesis 
holds that many “peripheral” components exist that are only loosely tied to the 
core framework. These diffuse freely and readily, as in the well-studied case of 
technical innovations (Rogers, 1983). Peripheral components may include ide¬ 
ational elements that make sense on their own and can be socially transmitted 
without a lot of supplementary cultural information. Such components are as¬ 
sumed to play little or no organizational role within the broader ideational 
system, and they must be relatively easy to learn. Such components are expected 
to be highly "contagious,” rather like Dawkins’s (1993) viruses of the mind. 

New forms will be adopted quickly, simply, and smoothly, particularly if 
there is some perceived functional advantage and low cost. In this instance, 
change is quick and easy: different components come and go as independent 
interchangeable parts. They are likely to spread horizontally among cultures, 
regardless of whether those cultures are related historically by branching. For 
this reason, their phylogenies will have the vine-like appearance mentioned 
earlier. Kroeber (1948) gave a long list of well-known examples (e.g., days of the 
week, tobacco, printing, paper, gunpowder, etc.). Unlike the descent-of-wholes 
hypothesis, the hierarchical hypothesis recognizes that cores are not as com¬ 
pletely isolated as good biological species. Kroeber's “tree of culture” implies 
that cultural descent is like a rain forest canopy tree—one whose crown is a 
tangle of branches (related by birth) and vines (related by diffusion). For some 
substantial period of time, one can easily distinguish what grows as branches 
from what grows as vines with more care, even in a thick, old tangle. Eventually, 
however, over the course of thousands of years, vines will proliferate and come 
to obscure the branches. At the same time, processes of coherence will integrate 
elements with separate histories. Old vines will coalesce to form a solid trunk— 
much like the strangler fig that starts out as a viny parasite of a tree, but gradually 
forms a solid trunk about its host, which then dies. 

The hierarchical model also acknowledges the rapidity of cultural evolution, 
compared with the biological case. The evidence of a history of common descent 
will gradually disappear in independent lineages. Barth (1987) gave a detailed 
account of the rapid evolution of the core tradition of the Mountain Ok of New 
Guinea due to a mutation-like process. The case is probably unusual because the 
core traditions are transmitted in rare secret rituals that create high “mutation” 
rates via forgetting. But even in the absence of diffusion, evidence of common 
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ancestry in sister cultures will degrade on the millennial timescale (compared 
with hundreds of millions of years, in the case of sister species of mammals). We 
know from the massive convergence of agricultural technology and state-level 
social institutions in the pre-Columbian New and Old Worlds that cultural 
evolution can produce spectacular adaptive change on the timescale of a few 
thousand years. We can almost be certain that Old-New World similarities were 
independently derived convergences, but only because we have the evidence of 
hundreds of cultures on both branches to help distinguish the vines. Notoriously, 
careless historians who ignore the massively redundant evidence have no trouble 
“finding” false descent relationships between Old and New World cultures (e.g., 
Heyerdhal, 1950). 

The Practice of Constructing Core-Cultural Phylogenies. The hierarchical hypoth¬ 
esis is supported to the extent that it can be shown that a large complex of core 
traits has a common pattern of descent. The core traditions in question must be 
related through a sequence of population fissionings (allowing for the odd core 
transfer). The existence of only one deep element, such as language, cannot be 
used alone to infer the existence of a full core of shared traditions among cultures 
related by language only. Because language phylogenies can be traced to consid¬ 
erable depth using conservative aspects of vocabulary and phonology, language 
trees are the usual starting point for attempting to trace out the descent patterns 
of larger core units. Related traditions can then be used as a basis for reconstructing 
a fuller culture history, including the “proto-tradition” out of which they evolved 
(see Aberle, 1984, 1987). Sometimes genetic relatedness of the populations in¬ 
volved provides supplementary evidence, given that full core replication by pro¬ 
cesses other than fission of a parent culture is unusual. However, if diffusion and 
rapid evolution swamped all traces of relationship by birth, anthropology could 
not speak of branches, only vines, and hypothesis 3 would be supported. 

The work of Rushforth and Chisholm on Athapaskan similarities illustrates 
the method. Linguistic evidence indicates that Athapaskans are part of a second 
wave of Native Americans that arrived from Asia a few thousand years after the 
migration that contributed most known pre-Columbian populations. At contact, 
the Athapaskan language family was spoken by people in quite isolated clusters 
in Canada, California, and the Southwest (the Southwestern group includes the 
famous Apache and Navajo). According to their analysis, the evidence suggests 
that a core of meaning related to social behavior coheres with language and that 
all are “cognate,” (i.e., related historically by culture birth; Rushforth and 
Chisholm, 1991). 

First, the authors implied that the pertinent beliefs and values in Athapaskan 
populations are distinct from those of the surrounding populations belonging to 
other language groups (although it is also true that the differences are not thor¬ 
oughly documented in their presentation). Second, similarity by diffusion can be 
ruled out because of the highly discontinuous geographical clustering of the 
carrier populations. Third, independent origins are highly improbable (Rushforth 
and Chisholm, 1991), even if each cluster of populations is taken as a whole. 

Rushforth and Chisholm (1991) concluded that the pertinent beliefs and 
values are all “genetically” related, having "originated in, and developed from, 
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a common, ancestral cultural tradition that existed among Proto-Athapaskan or, 
perhaps, even among [the ancestral] NaDene peoples” (p. 71]. As they put it, 
“simplicity strongly argues” that “this cultural framework originated once, early 
in Proto-Athapaskan or NaDene history and has persisted (perhaps with some 
modifications) in different groups after migrations separated them from contact 
with each other” (p. 78). 

The work of Indo-Europeanists to reconstruct the descent of societies 
speaking this family of languages is the most ambitious attempt yet made to 
reconstruct a pattern of descent for a core. According to some Indo-Europeanists 
like George Dumezil and Marija Gimbutas, the Indo-Europeans are the bearers 
of a core tradition consisting of language elements, myths, and a distinctive 
tripartite pattern of social organization that had its origin in a particular culture 
of steppe horse nomads. Gimbutas’s reconstructed “Kurgans” lived about 6,500 
years ago between the Black and Caspian Seas. Her Kurgan proposal is widely 
respected but also widely criticized; a reconstruction of such breadth and depth 
tests the margins of the hierarchical hypothesis (Mallory, 1989). 

Shared core traditions have been proposed for people in a number of dif¬ 
ferent regions of the world, each with time horizons dating back at least a few 
thousand years. Recently reviewed in Durham (1992), these include the oft- 
cited case of cultural similarity among Polynesian islanders (see especially Kirch, 
1986; Kirch and Green, 1987; see critical review in Terrell, 1986), the Atha- 
paskan (Rushforth and Chisholm, 1991) and Indo-European traditions men¬ 
tioned earlier (e.g., Gamkrelidze and Ivanov, 1990; Hallpike, 1986; but see 
Mallory, 1989), Mayans (Vogt, 1964), Tibetans (Durham, 1991), and Tupi 
speakers among native South Americans (Durham and Nassif, 1991). Although 
one could always argue that the Polynesian case is exceptional because of the 
inherent isolation of its populations, plausible examples of enduring shared 
traditions among cultures related by birth have now been proposed for a diverse 
array of continental populations as well. 

Consider Vansina's (1990) recent comprehensive study of political tradition 
in equatorial Africa. Through a controlled comparison of some 200 distinct 
societies in the basin of the Zaire river and its tributaries, Vansina concluded that 
these “widely differing societies arose out of [a] single ancestral tradition” (p. 
191) by way of 3,000-4,000 years of historical transformations. As reconstructed 
by Vansina, the original ancestral tradition came into the region with the im¬ 
migration of western Bantu-speaking farmers. They brought with them a single 
distinct pattern of social organization based on fragile temporary alliances into 
House (capital H in original), village, and district, and a common ideology and 
world view to go with it (see Vansina, 1990). 

From this common baseline, Vansina (1990) argued, through successive 
splits, migrations, and expansions, “widely differing societies arose out of the 
single ancestral tradition by major transformations” (p. 191). The variation in¬ 
cluded, for example, two kinds of segmentary lineage societies, four kinds of 
associations, and five kinds of chiefdoms or kingdoms. All the while, “the 
principles and fundamental options inherited [at birth] from the ancestral tra¬ 
dition remained a gyroscope in the voyage through time: they determined what 
was perceivable and imaginable as change” (p. 195). 
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Vansina made it clear that outside influences—“the new habitants, the 
autochthons [indigenous hunter-gatherers in the region], the non-Bantu, the 
eastern Bantu farmers with their different legacies—each influenced the devel¬ 
opment of this ancestral tradition differently from place to place’’ [p. 69). Yet as 
he repeatedly showed, change “was not mainly induced by outside influences. In 
all these cases [for example, in the inner Zaire basin] a chain of reactions fed 
continuous internal innovations. Outside innovations were accepted only insofar 
as they made sense in terms of existing structures” (p. 126). Even in regions 
where external influences played a relatively heavy role, the internal sovereignty 
of distinct polities meant that “internal dynamics always remain determining” 
[p. 192). Even with the establishment of Atlantic trade after 1480 and the 
attendant challenges of slave raiding and more, “the tradition was not defeated. 
It adapted. It invented new structures. [N]o foreign ideals or basic concepts were 
accepted and not even much of a dent was made in the aspirations of in¬ 
dividuals” (p. 236f). Inherited at birth in each equatorial society, the tradition 
lived on for hundreds of years more, only to be destroyed by European conquest 
between 1880 and 1920. 

Why Core Homology Matters. Vansina’s (1990) study illustrates a key proposi¬ 
tion of the hierarchical model. Even in continental areas with high contact be¬ 
tween peoples, one can still trace “the historical course of a single tradition” 
(p. 261). But there is a second important implication as well: reconstructing the 
histories of peoples without written records requires that one distinguish be¬ 
tween homologies (similarities produced by culture birth), analogies (similarities 
produced by convergence or parallel change), and synologies (similarities pro¬ 
duced by diffusion or borrowing). The reason, as Vansina noted, is that the 
reconstruction of past cultures requires that one “seeks out homologies first” (p. 
261). Only by identifying genuine cultural homologies can one establish the 
nature of the initial ideational system that was later transformed by historical 
processes. To the extent that hypothesis 2 of the four proves valid, it offers a 
useful tool that societies with no written records can use to gain access to their 
own histories. 


The Descent of Small Cultural Components 

On this hypothesis, there is no central core culture that deserves special atten¬ 
tion in phylogenetic analysis. Rather, there are multiple “cores” and sometimes 
quite small units whose descent can be usefully traced. To characterize a narrow 
region on the continuum of possible hypotheses, we suppose that even the 
biggest deeply coherent blocks of culture are fairly small. 

Definition. The components are collections ofmemesthat are transmitted as units 
with little recombination and slow change, and therefore their phylogenies can 
be reliably reconstructed to some depth. (As for the hierarchical hypothesis, how 
much recombination and change are tolerable depends on the timescale—deeper 
phylogenies require more coherent units and slower rates of evolution.) On 
this hypothesis, different components diffuse and recombine at a rapid rate, 
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compared with the rates of elements within components, so that core-like 
complexes of components will have shallower phylogenies than their smaller con¬ 
stituent components. 

The processes that provide “glue” for the hierarchical core hypothesis also 
explain the coherence within these smaller units. The amendments needed are 
only quantitative. If the scope of integration provided by internal processes is 
limited, and if ethnocentric barriers to diffusion are weak or shifting in kinds of 
components is protected, recombination between large blocks of memes will be 
high, although the same processes may protect many small sets of coherent memes. 
In practice, the units have to be large enough to have significant internal com¬ 
plexity, or their actual documented history has to be good. Otherwise, the amount 
of information available for descent reconstruction is limited. Thus, before the 
advent of modern molecular techniques, the functionally similar genes in various 
bacteria had a pattern of descent, but the traces of history needed to reconstruct 
the pattern were absent. When genes can be sequenced, a vastly greater array of 
data is available by reading the DNA strands directly. Strings of functionally 
irrelevant, highly improbable similarities and differences in the strands can now 
be used to construct phylogenies where classical biologists despaired. 

Is there any theoretical reason to expect smaller, rather than larger, coherent 
units in the cultural case? The fact that different cultural variants can be acquired 
from different people during different parts of the life cycle makes genealogical 
processes less effective at maintaining coherence than the analogous processes 
in the case of genetic evolution. We all have many cultural parents, with the 
attendant potential for independent samples of culture from many sources. At the 
same time, mixing could be less effective within small units because one can 
learn some things from one person or a small group of closely related mentors 
and other things from a quite different set of mentors. This may lead to small, 
but coherent, subcultures within a larger culture complex. For example, the cul¬ 
ture of science is fairly coherent and coexists within the same society as the 
culture of rock climbers, but people from each of these partial cultures may share 
the partial culture of the English language. (Of course, to some extent, science, 
rock climbing, and English are international institutions and provide avenues of 
communication among the cultures that play host to them.) On this argument, 
maintaining cultural coherence over large units faces a considerable mechanical 
obstacle due to the hyperrecombinatorial nature of the cultural transmission 
system. 

If one focuses on one special unit, such as those few features of language that 
cohere over long timescales, one may indeed find a few correlated units of other 
types that persist in having a pattern of descent in common with the language 
features, merely as a matter of chance. From one attempt at deep reconstruction 
to another, different pseudocore elements will be discovered. 

The linguistic characters used by historical linguists (basic lexicon, phono¬ 
logical rules) provide good examples of what is meant by a cultural component. 
Linguists can reconstruct a phylogeny for a basic lexicon and phonological rules 
that tells us the pattern of relationships among variants of this character. For 
example, we know that the basic lexicon and phonological rules that characterize 
English and German share a more recent ancestor than either does with French. In 
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other words, we believe that we can trace back the sizable complex of memes that 
underlie the English basic lexicon and phonology through a series of ancestor- 
descendant pairs to a point where the same people speak a language that has 
phonological rules and a basic lexicon that also forms the ancestor of German. 

Examples of Coherence of Small Units and Recombination among Them. A clear 
example of how sets of memes exhibit considerable coherence when borrowed 
between groups can be seen in the adoption of the “age organization” principle 
by Bantu peoples in Central and Eastern Africa [LeVine and Sangree, 1962). Age 
sets are an institution in which children born within a few years of one another 
are simultaneously initiated into a group of adolescents of nearly the same age 
[boys and girls into different sets). After initiation, a given age set is a corporate 
organization that is formally charged with a series of roles in succession [war¬ 
rior, married man, elder, etc.), with formal graduation from role to role of the 
whole set. 

The Tiriki [an offshoot of the Abaluhyia Bantu), for example, currently have 
an age organization almost identical to that of their Nilotic neighbors, the Terik, 
while remaining distinctively Abaluhyia in language and culture. This situation 
arose as a result of intense political turmoil in the mid-eighteenth century, when the 
Terik offered asylum to refugee segments of Abaluhyia lineages on condition 
that their men would become incorporated into the Terik warrior groups. At this 
time, the Tiriki warriors accepted the full set of initiation rituals for their sons 
[circumcision and seclusion) and adopted the seven named age-set system. In 
addition, the grades of warrior, retired warrior, judicial elder, and ritual elder 
emerged as the principal corporate units of political significance at the local level, 
and the Nilotic ideology of bravery and prowess in battle became predominant. 
Indeed, there is some evidence that the Tiriki became a distinct group within the 
Abaluhyia as a result of their adoption of Terik customs, as is indeed suggested by 
their name. Interestingly, the practice of female circumcision was viewed with 
disfavor by the Tiriki, such that they never adopted this trait. In short, this 
example shows how a number of cultural elements can be borrowed as a package, 
although not indiscriminately so, and the packages are often smallish. 

Linguistics also provides many good examples. Important components of the 
language spoken by a group of people often have a different evolutionary history 
from the basic lexicon and phonology of the same language. A substantial fraction 
of the words in the English lexicon [but not in the basic lexicon) share more 
recent common ancestors with words in French than with German. This is also 
true of English syntax, subject-verb-object like French, not subject-object-verb 
like most Germanic languages. It is even true of aspects of English phonology. For 
example, English speakers distinguish veal and feel, apparently as a result of the 
influence of Norman loan words. Thus, we can identify coherent cultural entities, 
words, and syntactical and phonological rules that are longer lived than the larger 
complex called the English language, and whose ancestry can be traced back 
through independent series of ancestor-descendant relationships. Thomason and 
Kaufman [1991) provided numerous other examples, including the Ma’a lan¬ 
guage spoken in northern Tanzania, which, despite classification as a Nilotic 
language, has a basic lexicon related to Cushitic languages and a grammar related 
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to Bantu languages. (We return to the problems that this example raises for the 
practice of linguistic classification later.) 

Less formal data suggest that important social organizational rules and val¬ 
ues are often decoupled rather rapidly from descent, as can be reckoned by the 
user of a basic lexicon and phonology. In Central and East Africa, for example, 
cyclical and linear age sets, alternating generation classes, genital mutilation of 
males and females, warrior organizations, and many other associated practices 
are common among people whose basic lexicons are categorized as Nilotic, 
Cushitic, and Bantu. Although it was once thought that these customs were 
essentially of Cushitic origin, it is now clear from Ehret’s (1971) linguistic 
analyses and voluminous ethnographic sources that different customs associated 
with the recruitment, function, and ritual validity of age organizations have been 
repeatedly borrowed between protolinguistic units over the last 5,000 years, 
reflecting periods of proximity, expansion, and dependence. The resulting situ¬ 
ation is one of a thorough intertwining of social organization and language. 

In some cases, the distribution of cultural traits appears to represent func¬ 
tional convergences, as in the case of the Tiriki, who adopted age sets and male 
circumcision in response to the turbulent militaristic conditions of the times. In 
other cases, there is evidence of a decoupling of apparently nonfunctional details. 
Thus, the Bantu Gusii conduct male and female genital mutilation but appar¬ 
ently have never organized their men into age sets (LeVine and Sangree, 1962); 
the Datoga dropped the 5-8 cycling age-set system of their protosouthern Ni¬ 
lotic ancestors for noncycling generation classes (Ehret, 1971). The Bantu Kuria 
provide a particularly revealing example of this complexity (Tobisson, 1986). 
Men belong to age sets almost indistinguishable in name from those of the 
southern Nilotes but are recruited on entirely different principles (father’s set 
membership, rather than circumcision cohort). However, the Kuria have im¬ 
portant military units; these are based on circumcision but are organized quite 
differently from those of the Nilotes and are quite unrelated to the age-set 
system that among the Kuria bears Nilotic names. The inescapable conclusion to 
be drawn from these complex observations is that the phylogeny of language and 
other cultural characters are often distinct. 

Religious practices provide many further examples: the spread of the Sun 
Dance on the Great Plains, the spread of Islam from Western to Central and 
Eastern Asia and Northern Africa, millenarian movements in Melanesia, and so 
on. Ethnographic details are sometimes available for such borrowings, and the 
motives involved do not seem to be such as to enforce much coherence. For 
example, Sierra Leonean Creoles first adopted freemasonry in the late 1940s. 
The reason seems to have been that exclusive occupation of elite political roles 
had long served Creoles with an integrative community symbolic system. When 
Creoles lost power to the large majority of tribal peoples without a slave back¬ 
ground, this symbol system was lost. Freemasonry happened to be an available 
substitute and quickly became very important (Cohen, 1974). Of course, na¬ 
tional and imperial powers sometimes maintain symbolic units over wide areas 
for impressive periods of time. The Habsburgs’ success in defending Catholi¬ 
cism and expelling Protestantism and Islam from their dominions during the life 
of the Austro-Hungarian Empire is a famous example. However, the need to 
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exercise a large measure of brute force to succeed in such an enterprise is per¬ 
haps testimony to the long-run weakness of large-scale coherence. 

There also may be rather well-bounded subcultures within a language group 
(as defined by a basic lexicon), as in the Indian caste system or the class, occu¬ 
pational, and religious subunits of many other state-level agricultural societies. 
Here, some memes are confined to some subset of the group—the castes, the 
guild, and so on. These subgroups may be marked by boundaries that are rather 
impervious to the flow of at least some kinds of memes. This phenomenon 
reaches its extreme in contemporary societies like the United States, where a 
diverse array of specialized subcultures of many types exists. 

These subgroups may be far more enduring than the “cultures” to which 
they bear a somewhat temporary allegiance. For example, East Africanists often 
question the attribution of any time depth to the ethnic units currently residing 
in the area. This is not simply a consequence of European colonialist policy. 
Thus, Waller (1986) painted a picture of the nineteenth-century and earlier 
ephemeral political associations of clans with different linguistic and cultural 
backgrounds, linked through diverse patterns of intermarriage, trade, expansion, 
and dependency. These flexible and highly inclusive concepts of group identity 
are seen as an adaptation to heterogeneous and somewhat unpredictable envi¬ 
ronmental conditions (i.e., circumstances by no means unique to East Africa). 
Knauft (1985) told a similar story about the Gebusi and their neighbors, the 
Bedamini, in the Fly River area of Papua New Guinea. According to this picture, 
there would be frequent recombination of memes due to temporary association 
of peoples who exchange memes while in contact. 

Comparison of Core and Small Units Hypotheses. Whether such examples are more 
representative than those given by supporters of the core hypothesis is an im¬ 
portant, but unanswered, question. The little anthropological work done is not 
capable of answering this question. There are a few studies, but they are inde¬ 
cisive. Jorgensen’s (1967, 1980) studies of the Salish and larger-scale analysis of 
the Indians of western North America are examples of the kind of comprehensive 
cultural analysis that might deliver. However, his methods are based on measures 
of overall similarity and difference and do not constitute proper analyses of de¬ 
scent. Biological systematists argue that the only evidence for membership in a 
given branch of a descent tree is given by characters that are shared by that branch 
alone but not more ancient or more recent similarities, much less similarities 
acquired by convergence. 

Even in the case of language, “wave” models of linguistic evolution 
have long contended with "genetic” analyses based on strict criteria of descent 
(Jorgensen, 1980; Mallory, 1989; Renfrew, 1987). Many features of Indo- 
European languages seem easier to account for if we assume that the whole 
family was in contact throughout most of its history and that innovative features 
tended to diffuse from multiple centers to neighboring languages. Treelike 
models of relationship can certainly be constructed for data that are substantially 
influenced by wavelike processes (e.g., with clustering algorithms). Just because 
a tree diagram explains much of the variation in a set of data, it does not guar¬ 
antee that the descent hypothesis is correct. It would be quite interesting to see 
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the modern “cladistic’ ’ methods of biological systematists formally applied to such 
cultural descent problems. At least part of the solution to the debate between 
proponents of hierarchical core and small units hypotheses will rest on the appli¬ 
cation of sharper methodological tools, and biologists have something to offer. 


The Descent of Memes 

The boundary of the small units hypothesis toward the small end of the con¬ 
tinuum is not well defined. It is also possible that, aside from core vocabulary and 
phonology, there are few multimeme cultural units that are well protected from 
diffusion. It could be that each of the cultural things we observe is affected by 
many memes, that these memes readily diffuse from one socially or linguistically 
defined group to another, and that memes that affect different cultural com¬ 
ponents readily recombine. For example, a religious system might be affected by 
many different memes: beliefs about causation, beliefs about the role of men and 
women, beliefs about disease, and so on. This system could diffuse from one 
group to another, and then some of the memes could recombine with other 
aspects of the culture. Beliefs about the roles of men and women that came with 
the new religious system might then recombine with preexisting beliefs about 
subsistence practices, generating new, observable subsistence variants. If we 
could actually measure the memes that characterize different human groups, this 
case would be much like the previous one, except we would reconstruct the 
phylogenies of memes largely instead of whole cultural components. 


Descent Analysis: Impossible or Uninteresting? 

There are several situations in which descent analysis regarding culture is im¬ 
possible. If we observe phenotype, and not the mental representations that are 
stored and transmitted, we cannot directly measure memes. The fact that many 
memes affect any given observable cultural attribute makes it difficult to trace 
the path of recombining memes, and reconstructing phylogenies is likely to be 
impossible. If the actual units to which descent might apply are as small as or 
smaller than our practically observable units, descent is impossible to trace 
simply because there is not enough information available to separate common 
descent from other hypotheses, such as independent origins. A quantitative 
character subject to blending inheritance is an extreme example. 

In some cases, methodological improvements may increase resolution. 
Comparative ethnographic data with age sets scored as present/absent, or as a 
quantitative variable on political importance, would not contain enough detail to 
reconstruct much history in East Africa. A richer data set offers more possibil¬ 
ities, as we have seen. 

The existence of coherent cultures will depend on the rate of diffusion and 
independent evolution. If the rate of diffusion among cultures for most char¬ 
acters is high, then there will be no cultural unit larger than some small atomistic 
unit of which to track the descent. Between the time that a newly formed group 
buds off its parent, and the time it creates buds itself, many new traits will have 
entered the group from outside. If the rate of evolution is high, the trace of 
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history also vanishes. High rates of random evolution, especially simple char¬ 
acters with few observable states, will eventually result in so many random 
“hits” that descendant characters will have occupied all states fairly recently. 
Similar simple artistic motifs are found in many cultures, perhaps because artists 
frequently rediscover and abandon them. Functional convergence presents sim¬ 
ilar problems. Around the world, tropical horticulturalists often live in small- 
scale societies that are murderously hostile to their neighbors. This commonality 
is presumably a by-product of the population densities and level of political 
organization supportable in wet tropical climates, not due to common ancestry. 

Even when descent analysis is possible, it may be uninteresting. The few 
components that resist diffusion—basic lexicon and so on—will be descended 
from the grandparental group (defined in terms of basic lexicon), but most 
components will not be descendants of components in that same grandparental 
group. Put another way, a culture is nothing more than its most elementary 
components. Each component may well be traceable back to a grandparental 
society. But a neighboring society may share particular grandparents for particular 
traits at random. Phylogenetic analysis could still be conducted for an element-by- 
element case, and this might be of interest or utility for some special cases. 
However, one important use of phylogeny is to make manageable the over¬ 
whelming complexity of populations and cultures. With no coherence, the 
analysis of descent could promise nothing in this regard. 


Partial Phylogenies and the Study of Adaptation 

Good phylogenies are crucial for the proper study of adaptation using the 
comparative method. Comparative studies attempt to determine the function 
of various attributes by looking for predicted correlations among societies. For 
example, Thornhill (1991) hypothesized that inbreeding avoidance rules func¬ 
tion to preserve capital in powerful families. To test this hypothesis, she col¬ 
lected data on inbreeding rules and social stratification, predicting (accurately) 
that the degree of elaboration of rules would positively correlate with the degree 
of social stratification. 

Similar studies utilizing correlations among species are widely used in 
comparative biology. A key problem in such comparative studies is determining 
the extent to which different societies (or species) are independent data points. 
In comparative biology, only independently derived associations are counted as 
separate data points. Thus, if an innovation arises and then the lineage speciates, 
preserving the innovation in both daughter species, the daughter species should 
be counted as a single data point. The first step in the proper exercise of the 
comparative method is phylogenetic reconstruction (Harvey and Pagel, 1991). In 
cross-cultural anthropology, this problem is referred to as “Gabon’s problem.” 
Scholars working in this discipline attempt to select their samples so as to in¬ 
clude only unrelated cultures or correct for diffusion by using statistical methods 
(Burton and White, 1987). 

Adaptations acquired by diffusion from other groups are related by descent 
to the adaptations in those groups. If one analogizes with the practice in biology, 
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such adaptations would not be counted as independent cases because the ad¬ 
aptation in the borrowing group is not an innovation. However, to the extent 
that diffusion represents the goal-driven choices of individuals in the borrowing 
group (or some other potentially adaptation-producing process), the borrowed 
trait is independent. If it had not been an adaptation, it would not have been 
adopted. This problem is particularly acute given that the rate of diffusion of 
new cultural adaptations through biased transmission is likely to be much higher 
than the rate of innovation. If this is so, most groups will adapt by borrowing, 
and it is unreasonably conservative to disregard these cases. 

The relationship between the Sun Dance and the buffalo-hunting ecology 
of the Great Plains people illustrates this difficulty. A summer ceremonial called 
the Sun Dance characterized all the Great Plains buffalo-hunting people. One 
might hypothesize that such a ceremony is related to the fission-fusion social 
organization that characterized the buffalo-hunting ecology of those people. But 
does one count this as one case, or several? It is likely that this ceremony orig¬ 
inated with the Crow and diffused to other tribes, so the various versions of the 
ceremony are not independent inventions. However, each group did adopt the 
ceremony, perhaps because it served the hypothesized need. Moreover, it could 
be that, in the absence of diffusion, each group would have independently de¬ 
veloped a summer ceremonial but did not because the rate of adaptation by 
diffusion is faster than independent invention (Oliver, 1962). 

On a longer temporal and spatial scale, the problem is also well illustrated 
by basic technical innovations like agriculture or iron working. The number of 
independent inventions of these techniques were few indeed—fewer even than 
the number of language-based descent groups that have subsequently adopted 
them. It seems absurd to say that we cannot really decide whether iron working 
is adaptive because all examples of iron-working technology are derived from a 
single common ancestor in Asia Minor about 3,400 years ago. Regardless of our 
answer of how many cases of iron working to count for purposes of estimating its 
adaptive value, it seems clear that language-based descent groups are largely 
irrelevant to solving this problem. We say “largely irrelevant” because it does 
seem that an association of an important adaptive innovation with a linguistic 
unit sometimes lasts long enough to carry the language area great distances, as 
with iron working and the Bantu expansion in Africa in the last millennium B.C. 
and the first millennium A.D. (Ehret, 1982); the use of abundant, but low- 
quality, plant resources and the spread of Numic languages in the American 
Great Basin (Bettinger and Baumhoff, 1982); and the domestication of the horse, 
invention of wheeled transport, and spread of Indo-European (Mallory, 1989). 
Note that such associations tend to persist only for a millennium or so, although 
the expansion of the innovating group tends to preserve the association. 


Conclusion 

It seems that, as regards most meme complexes, specific cultures are more like 
local populations within a species than like species. The whole human species is 
united by complex flows of ideas from one culture to another. This has always 



334 ARCHAEOLOGY AND CULTURE HISTORY 


been so, although the geographical isolation of the New World, Australia, and a 
few other areas from each other and Eurasia may have substantially isolated large 
blocks of cultures on multimillennial timescales. On smaller time and space 
scales, other mechanisms of isolation and coherence do generate some patterns 
of descent that are traceable for a few millennia. 

The use of descent analysis for cultural units has a long, but controversial, 
history. Many authors claim a degree of success in reconstructing the history of 
descent of fairly large cultural units fairly far into the past. The most interesting 
outstanding question is the size and timescale of coherent units of culture. Do 
single cores in an interrelated complex have real histories that reach back five 
millennia or more? There seems to be no doubt that many small units have 
descent relationships that can be reliably inferred for this depth, but the upper 
size/time limit is not well defined by current methods. There is an ill-explored 
neutral analogy worth further work here. The cladistic revolution in systematic 
biology has sharpened concepts and built new tools for phylogenetic analysis. 
Might they be used, despite the problem of high diffusion rates among cultures 
compared with species, to help advance the resolution of genetic versus wave 
explanation of culture history? 
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Was Agriculture Impossible 
during the Pleistocene but 
Mandatory during the 
Holocene? 

A Climate Change Hypothesis 

With Robert L. Bettinger 

Evolutionary thinkers have long been fascinated by the origin of 
agriculture. Darwin [1874] declined to speculate on agricultural origins, but 
twentieth-century scholars were bolder. The Soviet agronomist Nikolai Vavilov, 
the American geographer Carl O. Sauer, and the British archaeologist V. Gordon 
Childe wrote influential books and articles on the origin of agriculture in the 
1920s and 1930s (see Flannery, 1973, and MacNeish, 1991:4-19, for the intel¬ 
lectual history of the origin of agriculture question). These explorations were 
necessarily speculative and vague but stimulated interest in the question. 

Immediately after World War II, the American archaeologist Robert Braid- 
wood (Braidwood et ah, 1983) pioneered the systematic study of agricultural 
origins. From the known antiquity of village sites in the Near East and from the 
presence of wild ancestor species of many crops and animal domesticates in the 
same region, Braidwood inferred that this area was likely a locus of early do¬ 
mestication. He then embarked on an ambitious program of excavation in the 
foothills of the southern Zagros Mountains using a multidisciplinary team of 
archaeologists, botanists, zoologists, and earth scientists to extract the maximum 
useful information from the excavations. The availability of 14 C dating gave his 
team a powerful tool for determining the ages of the sites. Near Eastern sites older 
than about 15,000 B.P. excavated by Braidwood (Braidwood and Howe, 1960) 
and others were occupied by hunter-gatherers who put much more emphasis on 
hunting and unspecialized gathering than on collecting and processing the seeds 
of especially productive plant resources (Goring-Morris and Belfer-Cohen, 1998; 
Henry, 1989). Ages are given here as calendar dates before present (B.P.), where 
present is taken to be 1950, estimated from 14 C dates according to Stuiver et al.'s 
(1998) calibration curves. The Braidwood team showed that about 11,000 years 
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ago, hunter-gatherers were collecting wild seeds, probably the ancestors of wheat 
and barley, and were hunting the wild ancestors of domestic goats and sheep. At 
the 9,000 B.P. site of Jarmo, the team excavated an early farming village. Using 
much the same seed-processing technology as their hunter-gatherer ancestors 
2,000 years before, the Jarmo people were settled in permanent villages culti¬ 
vating early-domesticated varieties of wheat and barley. 

Numerous subsequent investigations now provide a reasonably detailed 
picture of the origins of agriculture in several independent centers and its sub¬ 
sequent diffusion to almost all of the earth suitable for cultivation. These in¬ 
vestigations have discovered no region in which agriculture developed earlier or 
faster than in the Near East, though a North Chinese center of domestication 
of millet may prove almost as early. Other centers seem to have developed later, 
or more slowly, or with a different sequence of stages, or all three. The spread of 
agriculture from centers of origin to more remote areas is well documented for 
Europe and North America. Ethnography also gives us cases where hunters and 
gatherers persisted to recent times in areas seemingly highly suitable for agri¬ 
culture, most notably much of western North America and Australia. Attempts 
to account for this rather complex pattern are a major focus of archaeology. 


Origin of Agriculture as a Natural Experiment 
in Cultural Evolution 

The processes involved in such a complex phenomenon as the origin of agricul¬ 
ture are many and densely entangled. Many authors have given climate change a 
key explanatory role (e.g., Reed, 1977:882-883). The coevolution of human 
subsistence strategies and plant and animal domesticates must also play an im¬ 
portant role (e.g., Blumler and Byrne, 1991; Rindos, 1984). Hunting-and-gath- 
ering subsistence may normally be a superior strategy to incipient agriculture 
(Cohen and Armelagos, 1984; Harris, 1977), and, if so, some local factor may be 
necessary to provide the initial impetus to heavier use of relatively low-quality, 
high-processing-effort plant resources that eventually result in plant domesti¬ 
cation. Population pressure is perhaps the most popular candidate (Cohen, 
1977). Quite plausibly, the complex details of local history entirely determine 
the evolutionary sequence leading to the origin and spread of agriculture in every 
region. Indeed, important advances in our understanding of the origins of agri¬ 
culture have resulted from pursuit of the historical details of particular cases 
(Bar-Yosef, 1998; Flannery, 1986). 

Nonetheless, we propose that much about the origin of agriculture can be 
understood in terms of two propositions: 

Agriculture was impossible during the last glacial age. During the last glacial age, 
climates were variable and very dry over large areas. Atmospheric levels of CO 2 
were low. Probably most important, last-glacial climates were characterized by 
high-amplitude fluctuations on timescales of a decade or less to a millennium. 
Because agricultural subsistence systems are vulnerable to weather extremes, and 
because the cultural evolution of subsistence systems making heavy, specialized use 
of plant resources occurs relatively slowly, agriculture could not evolve. 
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In the long run, agriculture is compulsory in the holocene epoch. In contrast 
to the Pleistocene climates, stable Holocene climates allowed the evolution of 
agriculture in vast areas with relatively warm, wet climates, or access to irriga¬ 
tion. Prehistoric populations tended to grow rapidly to the carrying capacity set 
by the environment and the efficiency of the prevailing subsistence system. Local 
communities that discover or acquire more intensive subsistence strategies will 
increase in number and exert competitive pressure on smaller populations with 
less intensive strategies. Thus, in the Holocene epoch, such intergroup compe¬ 
tition generated a competitive ratchet favoring the origin and diffusion of agri¬ 
culture. 1 

The great variation among local historical sequences in the adoption and 
diffusion of agriculture in the Holocene provides data to test our hypothesis. In 
the Near East, agriculture evolved rapidly in the early Holocene and became a 
center for its diffusion to the rest of western Eurasia. At the opposite extreme, 
hunting-and-gathering subsistence systems persisted in most of western North 
America until European settlement, despite many ecological similarities to the 
Near East. Thus, each local historical sequence is a natural experiment in the factors 
that limit the rate of cultural evolution of more intensive subsistence strategies. For 
our hypothesis to be correct, the evolution of subsistence systems must be rapid 
compared to the time cognitively modern humans lived under glacial conditions 
without developing agriculture, but slow relative to the climate variation that 
we propose was the main impediment to subsistence intensification in the late 
Pleistocene epoch. By cultural evolution, we simply mean the change over time 
in the attitudes, skills, habits, beliefs, and emotions that humans acquire by 
teaching or imitation. In our view (Bettinger, 1991; Boyd and Richerson, 1985], 
culture is best studied using Darwinian methods. We classify the causes of cul¬ 
tural change into several “forces.” In a very broad sense, we recognize three 
classes of forces: those due to random effects (the analogs of mutation and drift], 
natural selection, and decision making (invention, individual learning, biased 
imitation, and the like]. The decision-making forces will tend to accelerate 
cultural evolution relative to organic evolution, but by how much is a major 
issue in the explanation of agricultural origins. 


Was Agriculture Impossible in the Pleistocene? 

The Pleistocene geological epoch was characterized by dramatic glacial advances 
and retreats. Using a variety of proxy measures of past temperature, rainfall, ice 
volume, and the like, mostly from cores of ocean sediments, lake sediments, and 
ice caps, paleoclimatologists have constructed a stunning picture of climate 
deterioration over the last 14 million years (Bradley, 1999; Cronin, 1999; Lamb, 
1977; Partridge, et ah, 1995], The Earth’s mean temperature dropped several 
degrees and the amplitude of fluctuations in rainfall and temperature increased. 
For reasons that are as yet ill understood, glaciers wax and wane in concert with 
changes in ocean circulation, carbon dioxide, methane and dust content of the 
atmosphere, and changes in average precipitation and the distribution of pre¬ 
cipitation (Broecker, 1995], The resulting pattern of fluctuation in climate is 
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very complex. As the deterioration proceeded, different cyclical patterns of 
glacial advance and retreat involving all these variables have dominated the 
pattern. A 21,700-year cycle dominated the early part of the period, a 41,000- 
year cycle between about 3 and 1 million years ago, and a 95,800-year cycle 
during the last million years (deMenocal and Bloemendal, 1995). Milankovich’s 
hypothesis that these variations are driven by changes in the earth’s orbit, and 
hence the solar radiation income in the different seasons and latitudes, fits the 
estimated temperature variation well, although doubts remain (Cronin, 1999: 
185-189). 

Rapid Climate Variation in the Late Pleistocene 

The long timescale climate change associated with the major glacial advances and 
retreats is not directly relevant to the origins of agriculture because it occurs so 
slowly compared to the rate at which human populations adapt by cultural 
evolution. However, the ice ages also have great variance in climate at much 
shorter timescales. For the last 400,000 years, very high-resolution climate proxy 
data are available from ice cores taken from the deep ice sheets of Greenland and 
Antarctica. Resolution of events lasting little more than a decade is possible in 
Greenland ice 80,000 years old, improving to monthly resolution 3,000 years 
ago. During the last glacial, the ice core data show that the climate was highly 
variable on time scales of centuries to millennia (Clark, Alley, and Pollard, 1999; 
Dansgaard et ah, 1993; Ditlevsen, Svensmark, and Johnsen, 1996; GRIP 1993). 
Figure 17.1 shows data from the GRIP Greenland core. The <5 ls O curve is a 
proxy for temperature; less negative values are warmer. Ca 2+ is a measure of 
the amount of dust in the core, which in turn reflects the prevalence of dust- 
producing arid climates. The last glacial period was arid and extremely variable 
compared to the Holocene. Sharp millennial-scale excursions occur in estimated 
temperatures, atmospheric dust, and greenhouse gases. The intense variability of 
the last glacial carries right down to the limits of the nearly 10-year resolution 
of the ice core data. The highest resolution records in Greenland ice (and lower 
latitude records) show that millennial-scale warmings and coolings often began 
and ended very abruptly and were often punctuated by quite large spikes of 
relative warmth and cold with durations of a decade or two (e.g., Grafenstein 
et ah, 1999). Figure 17.2 shows Ditlevsen et al.’s (1996) analysis of a Greenland 
ice core. Not only was the last glacial age much more variable on timescales of a 
century and a half or more (150-year low-pass filter) but also on much shorter 
timescales (150-year high-pass filter). Even though diffusion and thinning within 
the ice core progressively erases high-frequency variation in the core (visible 
as the narrowing with increasing age of the 150-year high-pass data in figure 17.2), 
the shift from full glacial conditions about 18,000 years ago to the Holocene 
interglacial is accompanied by a dramatic reduction in variation on timescales 
shorter than 150 years. The Holocene (the last relatively warm, ice-free 11,600 
years) has been a period of very stable climate, at least by the standards of the 
last glacial age. 2 

The climate fluctuations recorded in high-latitude ice cores are also recorded 
at latitudes where agriculture occurs today. Sediments overlain by anoxic water 
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Figure 17 . 1 . Profiles of a temperature index, <5 ls O, and an index of dust content, Ca 2+ , 
from the GRIP Greenland ice core. 200-year means are plotted. The parts of the GRIP 
profile representing the last interglacial may have been affected by ice flow so their 
interpretation is uncertain (Johnsen et ah, 1997). Note the high-amplitude, high- 
frequency variation in both the temperature and dust records during the last glacial 
age. The Holocene epoch is comparatively much less variable. Plotted from original 
data obtainable at: ftp://ftp.ngdc.noaa.gov/paleo/icecore/greenland/summit/grip/iso- 
topes/gripdl 80 .txt and ftp://ftp.ngdc.noaa.gov/paleo/icecore/greenland/summit/grip/ 
chem/ca.txt. 


that inhibits sediment mixing by burrowing organisms are a source of low- and 
mid-latitude data with a resolution rivaling ice cores. Events recorded in North 
Atlantic sediment cores are closely coupled to those recorded in Greenland ice 
(Bond et ah, 1993), but so are records distant from Greenland. Hendy and 
Kennett (2000) report on water temperature proxies from sediment cores from 
the often-anoxic Santa Barbara Basin just offshore of central California. This data 
shows millennial- and submillennial-scale temperature fluctuations from 60-18 
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Figure 17 . 2 . High-resolution analysis of the GRIP ice core <5 18 0 data by Ditlevsen 
et al. (1996). The low-pass filtered data show that the Holocene epoch is much 
less variable than the Pleistocene on timescales of 150 years and longer. The high-pass 
filtered data shows that the Pleistocene was also much more variable on timescales less than 
150 years. The high- and low-pass filtering used spectral analytic techniques. These are 
roughly equivalent to taking a 150-year moving average of the data to construct the 
low-pass filtered series and subtracting the low-pass filtered series from the original data 
to obtain the high-pass filtered record. Since layer thinning increasingly affects deeper 
parts of the core by averaging variation on the smallest scales, the high-pass variance is 
reduced in the older parts of the core. In spite of this effect, the Pleistocene/Holocene 
transition is very strongly marked. 


150 Year High Pass Filter 





thousand years ago with an amplitude of about 8°C, compared to fluctuations of 
about 2°C in the Holocene epoch. As in the Greenland cores, the millennial-scale 
events often show very abrupt onsets and terminations and are often punctuated 
by brief spikes of warmth and cold. Schulz, von Rad, and Erlenkeuser (1998) 
analyzed organic matter concentrations in sediment cores at oxygen minimum 
depths from the Arabian Sea deposited over the past 110 thousand years. The var¬ 
iation in organic matter deposited is thought to reflect the strength of upwelling, 
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driven by changes in the strength of the Arabian Sea monsoon. AMS 14 C dating of 
both the Arabian Sea and Santa Barbara cores gives good time control in the 
upper part of the record, and the climate proxy variation is easily fit to Greenland 
ice millennial-scale interstadial-stadial oscillations. Allen, Watts, and Huntley 
(2000) examine the pollen profiles from the laminated sediments of Lago Grande 
di Monticcio in southern Italy. Changes in the proportion of woody taxa in the 
core were dominated by large-amplitude changes near the limits of resolution of 
the data, about a century. The millennial-scale variations in this core also cor¬ 
relate with the Greenland record. Peterson et al. (2000) show that proxies for the 
tropical Atlantic hydrologic cycle have a strong millennial-scale signal that like¬ 
wise closely matches the Greenland pattern. 

Reports of proxy records apparently showing the ultimate Younger Dryas 
millennial-scale cold episode, strongly expressed in the North Atlantic records 
12,600-11,600 B.P., have been reported from all over the world, including south¬ 
ern German oxygen isotope variations (Grafenstein et ah, 1999), organic geo¬ 
chemistry of the Cariaco Basin, Venezuela (Werne et ah, 2000), New Zealand 
pollen (Newnham and Lowe, 2000), and California pollen (West, 2000). The 
Younger Dryas episode has received disproportionate attention because the time 
period is easily dated by 14 C and is sampled by many lake and mountain glacier 
cores too short to reach older millennial-scale events. As Cronin (1999:202-221) 
notes, the Younger Dryas is frequently detected in a diverse array of Northern 
Hemisphere climate proxies from all latitudes. The main controversy involves 
data from the Southern Hemisphere, where proxy data often do not show a cold 
period coinciding with the Younger Dryas, although some records show a similar 
Antarctic Cold Reversal just antedating the Northern Hemisphere Younger 
Dryas (Bennett, Haberle, and Lumley, 2000). 

Other records provide support for millennial-scale climate fluctuations 
during the last glacial age that cannot be convincingly correlated with the 
Greenland ice record. Cronin (1999:221-236) reviews records from the deep 
tropical Atlantic, Western North America, Florida, China, and New Zealand. 
Recent notable additions to his catalog include southern Africa (Shi et ah, 2000), 
the American Midwest (Dorale et ah, 1998), the Himalayas (Richards, Owen, 
and Rhodes, 2000), and northeastern Brazil (Behling et al., 2000). Clapperton 
(2000) gives evidence for millennial-scale glacial advances and retreats from 
most of the American cordillera—Alaska and western North America through 
tropical America to the southern Andes. 

While the complex feedback processes operating in the atmosphere-biosphere- 
ocean system are not completely understood (Broecker 1995:241-270), plausible 
physical mechanisms could have linked temperature fluctuation in both hemi¬ 
spheres. For example, Broecker and Denton (1989) proposed an explanation 
based upon the effects of glacial meltwater on the deep circulation of the North 
Atlantic. Today, cold, salty water from the surface of the North Atlantic is the 
source of about half of the global ocean’s deep water. This large outflow of deep 
water currently must be balanced by an equally enormous inflow of warm surface 
and intermediate water into the high North Atlantic. If glacial meltwater lowered 
the salinity of the North Atlantic and interrupted the flow of deep water, the 
whole coupled atmosphere-ocean circulation system of the world would be 
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perturbed. Broecker and Denton’s hypothesis explains how the northern and 
southern Hemisphere temperature and ice fluctuations could have been in phase 
even though the direct effects of orbital-scale variation on the two hemispheres 
are out of phase. 

Impacts of Millennial-Scale and Submillennial-Scale 
Variation on Agriculture 

We believe that high-frequency climate and weather variation would have made 
the evolution of methods for intensive exploitation of plant foods extremely 
difficult. Holocene weather extremes significantly affect agricultural production 
(Lamb, 1977). For example, the impact of the Little Ice Age (400-150 B.P.) on 
European agriculture was quite significant (Grove, 1988). The Little Ice Age 
is representative of the Holocene millennial-scale variation that is very much 
more muted than last-glacial events of similar duration. Extreme years during 
the Little Ice Age caused notable famines and such extremes would have been 
more exaggerated and more frequent during last glacial times. The United Na¬ 
tions Food and Agriculture Organization’s (2000) Global Information and Early 
Warning System on Food and Agriculture gives a useful qualitative sense for the 
current impacts of interannual weather variation on food production. Quanti¬ 
tative estimates of current crop losses due to weather variation are difficult to 
make, but reasonable estimates run 10 percent on a country-wide basis (Gommes, 
1999) and perhaps 10-40 percent on a state basis in Mexico, depending upon 
mean rainfall (Eakin, 2000). Gommes believes that weather problems account 
for half of all crop losses. 

If losses in the Holocene are this high and if high-frequency climate variation 
in the last glacial age increased at lower latitudes roughly as much as at 
Greenland, a hypothetical last-glacial farming system would face crippling losses 
in more years than not. Devastating floods, droughts, windstorms, and other 
climate extremes, which we experience once a century, might have occurred 
once a decade. In the tropics, rainfall was highly variable (Broecker, 1996). Few 
years would be suitable for good growth of any given plant population. Even 
under relatively benign Holocene conditions agriculturalists and intensive plant 
collectors have to make use of risk-management strategies to cope with yield 
variation. Winterhalder and Goland (1997) use optimal foraging analysis to ar¬ 
gue that the shift from foraging to agriculture would have required a substantial 
shift from minimizing risk by sharing to minimizing risk by field dispersal. Some 
ethnographically known Eastern Woodland societies that mixed farming and 
hunting, for example, the Huron, seemed not to have made this transition and to 
have suffered frequent catastrophic food shortages. Storage by intensive plant 
collectors and farmers is an excellent means of meeting seasonal shortfalls, but is 
a marginal means of coping with interannual risk, much less multiyear shortfalls 
(Belovsky, 1987:60). 3 

If Winterhalder and Goland are correct that considerable field dispersal is 
required to manage Holocene yield risks, it is hard to imagine that further field 
division would have been successful at coping with much larger amplitude fluc¬ 
tuations that occurred during the last glacial age. We expect that opportunism 
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was the most important strategy for managing the risks associated with plant 
foods during the last glacial age. Annual plants have dormant seed that spreads 
their risk of failure over many years, and perennials vary seed output or storage 
organ size substantially between years as weather dictates. In a highly variable 
climate, the specialization of exploitation on one or a few especially promising 
species would be highly unlikely, because “promise” in one year or even for a 
decade or two would turn to runs of years with little or no success. However, 
most years would likely be favorable for some species or another, so generalized 
plant-exploitation systems are compatible with highly variable climates. The 
acorn-reliant hunter-gatherers of California, for example, used several kinds of 
oak, gathering less favored species when more favored ones failed (Baumhoff, 
1963:table 2). Reliance on acorns demanded this generalized pattern of species 
diversification because the annual production of individual trees is highly variable 
from year to year, being correlated within species but independent between 
species (Koenig et ah, 1994). Pleistocene hunter-gatherer systems must have been 
even more diversified, lacking the kind of commitment to a single resource cat¬ 
egory (acorns) observed in California. 

The evolution of intensive resource-use systems like agriculture is a rela¬ 
tively slow process, as we document. If ecological timescale risks could be 
managed some way, or if some regions lacked the high-frequency variation de¬ 
tected by the as yet few high-resolution climate proxy records, the evolution of 
sophisticated intensive strategies would still be handicapped by millennial-scale 
variation. Plant and animal populations responded to climatic change by dramat¬ 
ically shifting their ranges, but climate change was significant on the timescales 
shorter than those necessary for range shifts to occur. As a result, last-glacial 
natural communities must have always been in the process of chaotic reorga¬ 
nization as the climate varied more rapidly than they could reach equilibrium. 
The pollen record from the Mediterranean and California illustrates how much 
more dynamic plant communities were during the last glacial age (Allen et ah, 
1999; Heusser 1995). Pleistocene fossil beetle faunas change even more rapidly 
than plants because many species, especially generalist predators, change their 
ranges more rapidly than plants. Hence, they are better indicators of the eco¬ 
logical impacts of the abrupt, large-amplitude climate changes recorded by the 
physical climate proxies from the last glacial (Coope, 1987). 

Could the evolution of intensive plant-exploitation systems have tracked 
intense millennial- and submillennial-scale variation? Plant food-rich diets take 
considerable time to develop. Plant foods are generally low in protein and often 
high in toxins. Some time is required to work out a balanced diet rich in plant 
foods, for example, by incorporating legumes to replace part of the meat in diets. 
Whether intensification and agriculture always lead to health declines due to nu¬ 
tritional inadequacy is debatable, but the potential for them to do so absent 
sometimes-subtle adaptations is clear (Cohen and Armelagos, 1984; Katz, Hediger, 
and Valleroy, 1974). The seasonal round of activities has to be much modified, 
and women’s customary activities have to be given more prominence relative to 
men’s hunting. Changes in social organization either by evolution in situ or by 
borrowing tend to be slow (Bettinger and Baumhoff, 1982; North and Thomas, 
1973). We doubt that even sophisticated last-glacial hunter-gatherers would 
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have been able to solve the complex nutritional and scheduling problems asso¬ 
ciated with a plant-rich diet while coping with unpredictable high-amplitude 
change on timescales shorter than the equilibration time of plant migrations and 
shorter than actual Holocene trajectories of intensification. In keeping with our 
argument, the direct archaeological evidence suggests that people began to use 
intensively the technologies that underpinned agriculture only after about 
15,000 B.P. (Bettinger, 2000). 

Carbon Dioxide Limitation of Photosynthesis 

Plant productivity was also limited by lower atmospheric CO 2 during the last 
glacial. The CO 2 content of the atmosphere was about 190 ppm during the last 
glacial age, compared to about 250 ppm at the beginning of the Holocene (figure 
17.3). Photosynthesis on earth is C02-limited over this range of variation 
(Cowling and Sykes, 1999; Sage, 1995). Beerling and Woodward (1993; see also 
Beerling et ah, 1993) have shown that fossil leaves from the last glacial age have 
higher stomatal density, a feature that allows higher rates of gas exchange needed 
to acquire CO 2 under more limiting conditions. This higher stomatal conduc¬ 
tance also causes higher transpiration water losses per unit CO 2 fixed, exacer¬ 
bating the aridity characteristic of glacial times. Beerling (1999) estimates the 
total organic carbon stored on land as a result of photosynthesis during the Last 
Glacial Maximum using a spatially disaggregated terrestrial plant production 
model coupled to two different global climate models to provide the environ¬ 
mental forcing for plant growth. The model results differ substantially, one indi¬ 
cating a 33 percent lower, and the other a 60 percent lower, terrestrial carbon 
store at the Last Glacial Maximum compared to the Holocene. Mass-balance 
calculations based on stable isotope geochemistry also indicate a qualitatively 
large drop, but uncertainties regarding terrestrial <5 13 C lead to a similarly large 
range of estimates. Low mean productivity, along with greater variance in pro¬ 
ductivity, would have greatly decreased the attractiveness of plant resources 
during the last glacial age. 

Lower average rainfall and carbon dioxide during the last glacial age reduced 
the area of the earth’s surface suitable for agriculture (Beerling, 1999). Diamond 
(1997) argues that the rate of cultural evolution is more rapid when innovations 
in local areas can be shared by diffusion. Thus, a reduction in the area suitable for 
agriculture and the isolation of suitable areas from one another will have a 
tendency to reduce the rate of intensification and make the evolution of agri¬ 
culture less likely in any given unit of time. Since the slowest observed rates of 
intensification in the Holocene epoch failed to result in agriculture until the 
European invasions of the last few hundred years, a sufficient slowing of the rate 
of evolution of subsistence could conceivably in itself explain the failure of 
agriculture to emerge before the Holocene. A slower rate of cultural evolution 
would also tend to prevent the rapid adaptation of intensive strategies during any 
favorable locales or periods that might have existed during the last glacial. 

On present evidence we cannot determine whether aridity, low CO 2 levels, 
millennial-scale climate variability, or submillennial-scale weather variation was 
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Figure 17.3. Panel A shows the curve of atmospheric CO 2 as estimated from gas 
bubbles trapped in Antarctic glacial ice. Data from Barnola et al. (1987). Panel B 
summarizes responses of several plant species to experimental atmospheres containing 
various levels of CC> 2 . Based on data summarized by Sage (1995). 

the main culprit in preventing the evolution of agriculture. Low CO 2 and climate 
variation would handicap the evolution of dependence on plant foods every¬ 
where and were surely more significant than behavioral or technological ob¬ 
stacles. Hominids evolved as plant-using omnivores (Milton, 2000), and the basic 
technology for plant exploitation existed at least 10 thousand years before the 
Holocene (Bar-Yosef, 1998). At least in favorable localities, appreciable use 
seems to have been made of plant foods, including large-seeded grasses, well 
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back into the Pleistocene (Kisley, Nadel, and Carmi, 1992). Significantly, we 
believe, the use of such technology over spans of last-glacial time that were 
sufficient for successive waves of intensification of subsistence in the Holocene 
led to only minor subsistence intensification, compared to the Mesolithic, 
Neolithic, and their ever-more-intensive successors. 

Subsistence Responses to Amelioration 

As the climate ameliorated, hunter-gatherers in several parts of the world began 
to exploit locally abundant plant resources more efficiently, but only, current 
evidence suggests, during the Bolling-Allerod period of near-interglacial warmth 
and stability. The Natufian sequence in the Levant is the best-studied and so far 
earliest example (e.g., Bar-Yosef and Valla, 1991). One last siege of glacial cli¬ 
mate, the Younger Dryas from 12,900 B.P. until es 11,600 B.P., reversed these 
trends during the Late Natufian (e.g., Goring-Morris and Belfer-Cohen, 1998). 
The Younger Dryas climate was appreciably more variable than the preceding 
Allerod-Bolling and the succeeding Holocene (Grafenstein et ah, 1999; Mayewski 
et ah, 1993). The 10 abrupt, short, warm-cold cycles that punctuate the Younger 
Dryas ice record were perhaps felt as dramatic climate shifts all around the 
world. After 11,600 B.P., the Holocene period of relatively warm, wet, stable, 
C02-rich environments began. Subsistence intensification and eventually agri¬ 
culture followed. Thus, while not perfectly instantaneous, the shift from glacial 
to Holocene climates was a very large change and took place much more rapidly 
than cultural evolution could track. 

Might we not expect agriculture to have emerged in the last interglacial 
130,000 years ago or even during one of the even older interglacials? No ar¬ 
chaeological evidence has come to light suggesting the presence of technologies 
that might be expected to accompany forays into intensive plant collecting or 
agriculture at this time. Anatomically modern humans may have appeared in 
Africa as early as 130,000 years ago (Klein 1999: ch. 7), but they were not 
behaviorally modern. Humans of the last interglacial were uniformly archaic in 
behavior. Very likely, then, the humans of the last interglacial were neither 
cognitively nor culturally capable of evolving agricultural subsistence. However, 
climate might also explain the lack of marked subsistence intensification during 
previous interglacials. Ice cores from the thick Antarctic ice cap at Vostok show 
that each of the last four interglacials over the last 420,000 years was charac¬ 
terized by a short, sharp peak of warmth, rather than the 11,600-year-long stable 
plateau of the Holocene (Petit et al., 1999). Further, the GRIP ice core suggests 
the last interglacial (130,000-80,000 B.P.) was more variable than the Holocene, 
although its lack of agreement with a nearby replicate core for this time period 
makes this interpretation tenuous (Johnsen et ah, 1997). On the other hand, the 
atmospheric concentration of CO 2 was higher in the three previous interglacials 
than during the Holocene and was stable at high levels for about 20,000 years 
following the warm peak during the last interglacial. The highly continental 
Vostok site unfortunately does not record the same high-frequency variation in 
the climate as most other proxy climate records, even those in the southern 
hemisphere (Steig et ah, 1998). Some northern hemisphere marine and terrestrial 
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records suggest that the last interglacial was highly variable, while other data 
suggest a Holocene-length period of stable climates ca. 127,000-117,000 B.P. 
(Frogley, Tzedakis, and Heaton, 1999). Better data on the high-frequency part of 
the Pleistocene beyond the reach of the Greenland ice cores is needed to test hy¬ 
potheses about events antedating the latest Pleistocene. Long marine cores from 
areas of rapid sediment accumulation are beginning to reveal the millennial-scale 
record from previous glacial-interglacial cycles (McManus, Oppo, and Cullen, 
1999). At least the last five glacials have millennial-scale variations much like the 
last glacial. The degree of fluctuations during previous interglacials is still not 
clear, but at least some proxy data suggest that the Holocene has been less 
variable than earlier interglacials (Poli, Thunell, and Rio, 2000). 


During the Holocene, Was Agriculture Compulsory 
in the Long Run? 

Once a more productive subsistence system is possible, it will, over the long run, 
replace the less-productive subsistence system that preceded it. The reason is 
simple: all else being equal, any group that can use a tract of land more efficiently 
will be able to evict residents that use it less efficiently (Boserup, 1981; Sahlins 
and Service, 1960:75-87). More productive uses support higher population 
densities, or more wealth per capita, or both. An agricultural frontier will tend to 
expand at the expense of hunter-gatherers as rising population densities on the 
farming side of the frontier motivate pioneers to invest in acquiring land from 
less-efficient users. Farmers may offer hunter-gatherers an attractive purchase 
price, a compelling idea about how to become richer through farming, or a dis¬ 
mal choice of flight, submission, or military defense at long odds against a more 
numerous foe. Early farmers (and other intensifiers more generally) are also 
liable to target opportunistically high-ranked game and plant resources essential 
to their less-intensive neighbors, exerting scramble competitive pressure on them 
even in the absence of aggressive measures. Thus, subsistence improvement 
generates a competitive ratchet as successively more land-efficient subsistence 
systems lead to population growth and labor intensification. Locally, hunter- 
gatherers may win some battles (e.g., in the Great Basin; Madsen, 1994), but in 
the long run the more intensive strategies will win wherever environments are 
suitable for their deployment. 

The archaeology supports this argument (Bettinger, 2000). Societies in all 
regions of the world undergo a very similar pattern of subsistence efficiency in¬ 
crease and population increase in the Holocene, albeit at very different rates. 
Holocene hunter-gatherers developed local equilibria that, while sometimes lasting 
for thousands of years, were almost always replaced by more intensive equilibria. 


Alternative Hypotheses Are Weak 

Aside from other forms of the climate-change hypotheses described, archae¬ 
ologists have proposed three prominent hypotheses—climate stress, population 
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growth, and cultural evolution—to explain the timing of agricultural origins. 
They were formulated before the nature of the Pleistocene-Holocene transition 
was understood but are still the hypotheses most widely entertained by ar¬ 
chaeologists (MacNeish, 1991). None of the three provides a close fit with the 
empirical evidence or to theory. 

Climate Stress Was First Too Common, Then Too Rare 

Childe (1951) proposed that terminal Pleistocene desiccation stressed forager 
populations and led to agriculture. Wright (1977) argued that Holocene climate 
amelioration brought pre-adapted plants into the Fertile Crescent areas where 
agriculture first evolved. Bar-Yosef (1998) and Moore and Hillman (1992) ar¬ 
gue that Late Natufian sedentary hunter-gatherers probably undertook the first 
experiments in cultivation under the pressure of the Younger Dry as climate 
deterioration. Natufian peoples lived in settled villages and exploited the wild an¬ 
cestors of wheat and barley beginning in the Allerod-Bolling warm period 
(14,500-12,900 B.P.) (Henry, 1989) and then reverted to mobile hunting-and- 
gathering during the sharp, short Younger Dryas climate deterioration (12,600- 
11,600 B.P.), the last of the high-amplitude fluctuations that were characteristic 
of the last glacial (Bar-Yosef and Meadow, 1995; Goring-Morris and Belfer- 
Cohen, 1997). Post-Natufian cultures began to domesticate the same species as 
warm and stable conditions returned after the Younger Dryas, around 11,600 B.P. 
Unfortunately, a flat spot in the 14 C/calendar-year calibration curve makes 
precise dating difficult for the most critical several hundred years centered on 
11,600 B.P. (Fiedel, 1999). As a component of an explanation of a local sequence 
of change, such hypotheses may well be correct. Yet they beg the question of 
why the 15 or so similar deteriorations and ameliorations of the last glacial age 
did not anywhere lead to agriculture or why most of the later origins of agri¬ 
culture occurred in the absence of Younger Dryas-scale deteriorations. Note also 
that, in principle, populations can adjust downward to lower carrying capacities 
through famine mortality even more quickly than they can grow up to higher 
ones. Such hypotheses cannot, we believe, explain the longer time- and larger 
spatial-scale problem of the absence of agriculture in the Pleistocene and its 
multiple origins and rapid spreads in the Holocene. 

The details of subsistence responses to the Younger Dryas in the areas of 
early origins of agriculture will eventually produce a sharp test of the variability 
hypothesis. We suggest that the late Natufian de-intensification in response to 
the Younger Dryas was a retreat from the trend leading to agriculture and was 
unlikely to have produced the first steps toward domestication. More likely, the 
late Natufian preserved remnants of earlier, more intensive Natufian technology 
and social organization that served to start the Levantine transition to agriculture 
at an unusually advanced stage after the Younger Dryas ended. Events in the 
Younger Dryas time period also provide an opportunity to investigate the effects 
of CCb concentration partly independently of climate variability. The rise in 
CO 2 concentration in the atmosphere began two to three millennia before tem¬ 
peratures began to rise and continued to increase steadily through the Younger 
Dryas (Sowers and Bender, 1995). The Younger Dryas period de-intensification 
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of the Natufian suggests an independent effect of millennial or submillennial 
variability. 

Population Growth Has the Wrong Timescale 

Cohen’s (1977) influential book argued that slowly accumulating global-scale 
population pressure was responsible for the eventual origins of agriculture be¬ 
ginning at the 11,600 B.P. time horizon. He imagines, quite plausibly, that sub¬ 
sistence innovation is driven by increases in population density, but, implausibly 
we believe, that a long, slow buildup of population gradually drove people to 
intensify subsistence systems to relieve shortages caused by population growth, 
eventually triggering a move to domesticates. Looked at one way, population 
pressure is just the population growth part of the competitive ratchet. However, 
this argument fails to explain why pre-agricultural hunter-gatherer intensifica¬ 
tion and the transition to agriculture began in numerous locations after 11,600 
years ago (Hayden, 1995). Assuming that humans were essentially modern by 
the Upper Paleolithic, they would have had 30,000 years to build up a popu¬ 
lation necessary to generate pressures for intensification. Given any reasonable 
estimate of the human intrinsic rate of natural increase under hunting-and- 
gathering conditions (somewhat less than 1% yr~' to 3% yr _1 ), populations 
substantially below carrying capacity will double in a century or less, as we will 
see in the models that follow. 

A Basic Model of Population Pressure 

Since the population explanation for agriculture and other adaptive changes 4 
connected with increased subsistence efficiency remains very popular among 
archaeologists, we take the time here to examine its weakness formally. The 
logistic equation is one simple, widely used model of the population growth. The 
rate of change of population density, N, is given by: 



where r is the “intrinsic rate of natural increase”—the rate of growth of popu¬ 
lation density when there is no scarcity—and K is the “carrying capacity,” the 
equilibrium population density when population growth is halted by density- 
dependent checks. In the logistic equation, the level of population pressure is 
given by the ratio N/K. When this ratio is equal to zero, the population grows at 
its maximum rate; there is no population pressure. When the ratio is one, 
density dependence prevents any population growth at all. It is easy to solve this 
equation and calculate the length of time necessary to achieve any level of pop¬ 
ulation pressure, 71 = N/K. 



where n 0 is the initial level of population pressure. Let us very conservatively 
assume that the initial population density is only 1 percent of what could be 
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sustained with the use of simple agriculture and that the maximum rate of 
increase of human populations unconstrained by resource limitation is 1 percent 
per year. Under these assumptions, the population will reach 99 percent of the 
maximum population pressure (i.e., n = .99) in only about 920 years. Seren¬ 
dipitous inventions (e.g., the bow and arrow) that increase carrying capacity do 
not fundamentally alter this result. For example, only the rare single invention is 
likely to so much as double carrying capacity. If such an invention spreads within 
a population that is near its previous carrying capacity, it will still face half the 
maximum population pressure and thus significant incentive for further inno¬ 
vation. At an r of 1 percent, such an innovating population will again reach 99 
percent of the maximum population pressure in 459 years. 

One might think that this result is an artifact of the very simple model of 
population growth. However, it is easy to add much realism to the model 
without any change of the basic result. In Appendix 1 we show that a more 
realistic version of the logistic equation actually leads to even more rapid growth 
of population pressure. 


Allowing for Dispersal 

Once, after listening to one of us propound this argument, a skeptical archae¬ 
ologist replied, “But you’ve got to fill up all of Asia, first.” This understandable 
intuitive response betrays a deep misunderstanding of the timescales of expo¬ 
nential growth. Suppose that the initial population of anatomically modern 
humans was only about 10 4 and that the carrying capacity for hunter-gatherers is 
very optimistically 1 person per square kilometer. Given that the land area of the 
Old World is roughly 10 s km 2 , n 0 = 10 4 /10 8 = 1CU 4 . Then using equation 2 and 
again assuming r= .01, Eurasia will be filled to 99 percent of carrying capacity in 
about 1,400 years. The difference between increasing population pressure by a 
factor of 100 and by a factor of 10,000 is only about 500 years! 

Moreover, this calculation seriously overestimates the amount of time that 
will pass before any segment of an expanding Eurasian population will experi¬ 
ence population pressure because populations will approach carrying capacity 
locally long before the entire continent is filled with people. R. A. Fisher (1937) 
analyzed the following partial differential equation that captures the interaction 
between population growth and dispersal in space: 



population growth dispersal 


Here N(%) is the population density at a point x in a one-dimensional environ¬ 
ment. Equation (3) says that the rate of change of population density in a par¬ 
ticular place is equal to the population growth there plus the net effect of 
random, density-independent dispersal into and out of the region. The parameter 
d measures the rate of dispersal and is equal to the standard deviation of the 
distribution of individual dispersal distances. In an environment that is large 
compared to d, a small population rapidly grows to near carrying capacity at its 
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Figure 17.4. A numerical simulation of Fisher’s equation showing that after an initial 
period, population spreads at a constant rate so that at any point in space population 
pressure increases to its maximum in less than 500 years for reasonable parameter 
values. (Redrawn from Ammerman and Cavalli-Sforza, 1984). 


initial location, and then, as shown in figure 17.4 (redrawn from Ammerman and 
Cavalli-Sforza, 1984), begins to spread in a wave-like fashion across the envi¬ 
ronment at a constant rate. Thus, at any given point in space, populations move 
from the absence of population pressure to high population pressure as the wave 
passes over that point. Figure 17.4 shows the pattern of spread for r= .01 and 
d ps 30. With these quite conservative values, it takes less than 200 years for the 
wave front to pass from low population pressure to high population pressure. 
More realistic models that allow for density-dependent migration also yield a 
constant, wave-like advance of population (Murray, 1989), and although the 
rates vary, we believe that the same qualitative conclusion will hold. 


The Dynamics of Innovation 

So far we have assumed that the carrying capacity of the environment is fixed 
(save where it is increased by fortuitous inventions). However, we know that 
people respond to scarcity caused by population pressure by intensifying produc¬ 
tion, for example, by shifting from less labor-intensive to more labor-intensive 
foraging, or by innovations that increase the efficiency of subsistence (Boserup, 
1981). Since innovation increases carrying capacity, intuition suggests that it 
might therefore delay the onset of population pressure. However, as the model 
in Appendix 2 shows, this intuition, too, is faulty. 

Figure 17.5 shows the results of the model in Appendix 2. A small popu¬ 
lation initially grows rapidly. As population pressure builds, population growth 
rate slows to a steady state in which population pressure is constant, and just 
enough innovation occurs to compensate for population growth. For plausible 
parameter values the second phase of population growth steady state is reached 
in less than a thousand years. Interestingly, increasing the intrinsic rate of in¬ 
novation or the innovation threshold reduces the waiting time until population 
pressure is important. Innovation allows greater population increases over the 
long run, but it does not change the timescales on which population pressure 
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Figure 17.5. This plots the logarithm of population size as a function of time for the 
model described in Appendix 2. Initially, when there is little population pressure, 
population grows at a high rate. As the population grows, per capita income decreases, 
and people intensify. Eventually the population growth rate approaches a constant value 
at which the growth of intensification balances growth in population. For reasonable 
parameters (a = 0.005, r= 0.02, y m = 1, y s = 0.1, y t = 0.2, initial population size 1 percent 
of initial carrying capacity), it takes less than 500 years to shift from the initial low 
population pressure mode of growth to the final high population pressure mode of 
growth. 


occurs. The most important factor on timescales of a millennium or greater (if 
not a century or greater, given realistic starting populations) is the rate of in¬ 
tensification by innovation, not population growth. 

This picture of the interaction of demography and innovation leads to 
predictions quite different from those of scholars like Cohen (1977). For ex¬ 
ample, we do not expect to see any systematic evidence of increased population 
pressure immediately prior to major innovations, an expectation consistent with 
the record (Hayden, 1995). If people are motivated to innovate whenever 
population pressure rises above an innovation threshold, and if, in the absence 
of successful innovation, populations adjust relatively quickly to changes in K by 
growth or contraction, then evidence of extraordinary stress—for example, 
skeletal evidence of malnutrition—is likely only when rapid environmental de¬ 
terioration exceeds a population’s capacity to respond via a combination of down¬ 
ward population adjustment and innovation. 5 Thus, for parameter values that 
seem anywhere near reasonable to us, population growth on millennial time- 
scales will be limited by rates of improvement in subsistence efficiency, not by 
the potential of populations to grow, just as Malthus argued. Populations can 
behave in non-Malthusian ways only under extreme assumptions about popu¬ 
lation dynamics and rates of intensification, such as the modern world in which 
the rate of innovation, but also the rate of population growth, is very high. 

Of course, in a time as variable as the Pleistocene epoch, populations may well 
have spent considerable time both far above and far below instantaneous carrying 
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capacity. If agricultural technologies were quick and easy to develop, the population- 
pressure argument would lead us to expect Pleistocene populations to shift in and 
out of agriculture and other intensive strategies as they find themselves in subsis¬ 
tence crises due to environmental deterioration or in periods of plenty due to ame¬ 
lioration. Most likely, minor intensifications and de-intensifications were standard 
operating procedure in the Pleistocene. However, the time needed to progress 
much toward plant-rich strategies was greater than the fluctuating climate allowed, 
especially given CO 2 - and aridity-limited plant production. 

Cultural Evolution Has the Wrong Timescale 

The timing of the origin of agriculture might possibly be explained entirely by the 
rate of intensification by innovation. For example, Braidwood (1960) argued 
that it took some time for humans to acquire enough familiarity with plant 
resources to use them as a primary source of calories, and that this “settling in” 
process limited the rate at which agriculture evolved. This proposal may explain 
the post-Pleistocene timing of the development of agriculture. However, if we 
interpret his argument to be that the settling-in process began with the evolution of 
behaviorally modern humans, the timescale is wrong again. There is no evidence 
that people were making significant progress at all toward agriculture for 30,000 
years, and Braidwood’s excavations at Jarmo show that some 4,000 years was 
enough to go from an unintensive hunting-and-gathering subsistence system to 
settled village agriculture in a fast case. Ten thousand years in the Holocene was 
ultimately sufficient for the development of plant-intensive gathering technologies 
or agriculture everywhere except in the coldest, plant-poor environments. 


The Pattern of Intensification across Cases 
Implicates Climate Change 

We have argued that Malthusian processes lead to population pressure much 
more quickly than assumed by such writers as Cohen (1977) and that the rate 
of cultural “settling in” and intensification is faster than Braidwood (1960) 
imagined, but not fast enough to intensify more than a small distance toward 
agriculture in the highly variable environments of the Pleistocene. Thus, our hy¬ 
pothesis that the abrupt transition from glacial to Holocene climates caused the 
origin of agriculture requires that Holocene rates of intensification be neither too 
slow nor too fast. 

Agriculture Was Independently Evolved about to Times 

The sample of origins is large enough to support some generalizations about the 
processes involved. Table 17.1 gives a rough time line for the origin of agricul¬ 
ture in seven fairly well-understood centers of domestication, two more con¬ 
troversial centers, three areas that acquired agriculture by diffusion, and two 
areas that were without agriculture until European conquest. 6 The list of inde¬ 
pendent centers is complete as far as current evidence goes, and while new 
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Table 17.1. Dates before present in calendar years of achievement of plant-intensive 
hunting and gathering and agriculture in different regions, mainly after Smith (1995) 


Region 

Intensive foraging 

Agriculture 

Centers of domestication 

Near East: Bar-Yosef and Meadow, 1995 

15,000 

11,500 

North China: An, 1991; Elston et ah, 1997 

11,600 

> 9,000 

South China: An, 1991 

12 , 000 ? 

8,000 

Sub-Saharan Africa: Klein, 1993 

9,000 

4,500 

Southcentral Andes: Smith, 1995 

7,000 

5,250 

Central Mexico: Smith, 1995 

7,000 

5,750 

Eastern United States: Smith, 1995 

6,000 

5,250 

Controversial centers 

Highland New Guinea: Golson, 1977 

7 

9,000? 

Amazonia: Pearsall, 1995 

13,000? 

9,000? 

Acquisition by diffusion 

Northwestern Europe 

12,500 

7,000 

Southwestern U.S.: Cordell, 1984; Doelle, 1999 

6,000 

3,500 

Japan: Aikens and Akazawa, 1996; Crawford, 1992 

10,500 

3,000 

Never acquired agricidture 

California: Bettinger, 2000 

4,000 

n/a 

Australia: Hiscock, 1994; Smith, 1987 

3,500 

n/a 


centers are not unexpected, it is unlikely that the present list will double. Nu¬ 
merous areas acquired agriculture by diffusion (societies acquire most of their 
technological innovations by diffusion, not independent invention), so the three 
areas in table 17.1 are but a small sample. The number of nonarctic areas with¬ 
out agriculture at European contact is small and the two listed, western North 
America and Australia, are the largest and best known. 

Two lines of evidence indicate that the seven centers of domestication are 
independent. First, the domesticates taken up in each center are distinctive, and 
no evidence of domesticates from other centers turns up early in any of the 
sequences. For example, in the eastern North American center a sunflower, a 
goosefoot, marsh elder, an indigenous squash, and other local plants were taken 
into cultivation around 6,000 B.P. Mesoamerican maize subsequently appeared 
here around 2,000 B.P. but remained a minor domesticate until around 1,100 B.P., 
when it suddenly crowded out several traditional cultivars (Smith, 1989). Sec¬ 
ond, archaeology suggests that none of the centers had agricultural neighbors at 
the time that their initial domestications were undertaken. The two problematic 
centers, New Guinea and Fowland South America, present difficult archaeo¬ 
logical problems (Smith, 1995). Sites are hard to find and organic remains are 
rarely preserved. The New Guinea evidence consists of apparently human- 
constructed ditches that might have been used in controlling water for taro 
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cultivation. The absence of documented living sites associated with these fea¬ 
tures makes their interpretation quite difficult. The lowland South American 
evidence consists of starch grains embedded in pottery fragments and phytoliths, 
microscopic silicious structural constituents of plant cell walls. The large size of 
some early starch grains and phytoliths convinces some archaeologists that root 
crops were brought under cultivation in the Amazon Basin at very early dates. 

The timing of initiation of agriculture varies quite widely. The Near Eastern 
Neolithic is the earliest so far attested. In northern, and possibly southern, China, 
however, agriculture probably followed within a thousand years of the beginning of 
the Holocene, even though the best-documented, clearly agricultural complexes 
are still considerably later (An, 1991; Crawford, 1992; Lu, 1999). Agriculture may 
prove to be as early in northern China as in the Near East, since the earliest dated 
sites, which extend back to 8,500 B.P., represent advanced agricultural systems that 
must have taken some time to develop. Excavations in northern China north of the 
earliest dated agricultural sites document a technological change around 11,600 
B.P., signaling a shift toward intensive plant and animal procurement that may have 
set this process in motion (Elston et ah, 1997). 

The exact sequence of events also varies quite widely. For example, in the 
Near East, sedentism preceded agriculture, at least in the Levantine Natufian 
sequence, but in Mesoamerica crops seem to have been added to a hunting-and- 
gathering system that was dispersed and long remained rather mobile (MacNeish, 
1991:27-29). For example, squash seems to have been cultivated around 
10,000 B.P. in Mesoamerica, some 4,000 years before corn and bean domestication 
began to lead to the origin of a fully agricultural subsistence system (Smith, 
1997). Some mainly hunting-and-gathering societies seem to have incorporated 
small amounts of domesticated plant foods into their subsistence system without 
this leading to full-scale agriculture for a very long time. Perhaps American do¬ 
mesticates were long used to provide specialized resources or to increase food 
security marginally (Richard Redding, personal communication) and initially 
raised human carrying capacities relatively little, thus operating the competitive 
ratchet quite slowly. According to MacNeish, the path forward through the 
whole intensification sequence varied considerably from case to case. 

A Late Intensification of Plant Gathering Precedes Agriculture 

In all known cases, the independent centers of domestication show a late se¬ 
quence of intensification beginning with a shift from a hunter-gatherer subsis¬ 
tence system based upon low-cost resources using minimal technological aids to 
a system based upon the procurement and processing of high-cost resources, 
including small game and especially plant seeds or other labor-intensive plant 
resources, using an increasing range of chipped and ground stone tools (Hayden, 
1995). The reasons for this shift are the subject of much work among archae¬ 
ologists (Bettinger, 2000). The shifts at least accelerate and become widespread 
only in the latest Pleistocene or Holocene. However, a distinct tendency toward 
intensification is often suggested for the Upper Paleolithic more generally. Stiner 
et al. and commentators (2000) note that Upper Paleolithic peoples often made 
considerable use of small mammals and birds in contrast to earlier populations. 
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These species have much lower body fat than large animals, and excessive 
consumption causes ammonia buildup in the body due to limitations on the rate 
of urea synthesis (“rabbit starvation”; Cordain et al., 2000). Consequently, any 
significant reliance on low-fat small animals implies corresponding compensation 
with plant calories, and at least a few Upper Paleolithic sites, such as the Ohalo II 
settlement on the Sea of Galilee (Kislev et al., 1992), show considerable use of 
plant materials in Pleistocene diets. Large-seeded annual species like wild barley 
were no doubt attractive resources in the Pleistocene when present in abundance 
and would have been used opportunistically during the last glacial age. If our 
hypothesis is correct, in the last glacial age no one attractive species like wild 
barley would have been consistently abundant (or perhaps productive enough) 
for a long enough span of time in the same location to have been successfully 
targeted by an evolving strategy of intensification, even if their less intensive 
exploitation was common. The broad spectrum of species, including small game 
and plants, reflected in these cases is not per se evidence of intensification 
(specialized use of more costly but more productive resources using more labor 
and dedicated technology), as is sometimes argued (Flannery, 1971). In most 
hunter-gatherer systems, marginal diet cost and diet richness (number of species 
used) are essentially independent (Bettinger, 1994:46-47), and prey size is far 
less important in determining prey cost than either mode or context of capture 
(Bettinger, 1993:51-52; Bettinger and Baumhoff, 1983:832; Madsen and Schmitt, 
1998). For all these reasons, quantitative features of subsistence technology are a 
better index of Pleistocene resource intensification than species used. We believe 
that the dramatic increase in the quantity and range of small chipped stone and 
groundstone tools only after 15,000 B.P. signals the beginning of the pattern of 
intensification that led to agriculture. 

Early intensification of plant resource use would have tended to generate the 
same competitive ratchet as the later forms of intensification. Hunter-gatherers 
who subsidize hunting with plant-derived calories can maintain higher popula¬ 
tion densities and thus will tend to deplete big game to levels that cannot sustain 
hunting specialists (Winterhalder and Lu, 1997). Upper Paleolithic people ap¬ 
pear to be fully modern in their behavioral capacities (Klein, 1999). Important 
changes in subsistence technology did occur during the Upper Paleolithic, for 
example, the development of the atlatl. Nevertheless, modern abilities and the 
operation of the competitive ratchet drove Upper Paleolithic populations only a 
relatively small distance down the path to the kind of heavy reliance on plant 
resources that in turn set the stage for domestication. 

Braidwood’s reasoning that pioneering agriculturalists would have gained 
their intimate familiarity with proto-domesticates first as gatherers is logical and 
supported by the archaeology. Once the climate ameliorated, the rate of inten¬ 
sification accelerated immediately in the case of the Near East. In other regions 
changes right at the Pleistocene-Holocene transition were modest to invisible 
(Straus et al., 1996). The full working out of agrarian subsistence systems took 
thousands of years. Indeed, modern breeding programs illustrate that we are still 
working out the possibilities inherent in agricultural subsistence systems. 

The cases where Holocene intensification of plant gathering did not lead 
directly to agriculture are as interesting as the cases where it did. The Jomon of 
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Japan represents one extreme (Imamura, 1996]. Widespread use of simple pot¬ 
tery, a marker of well-developed agricultural subsistence in western Asia, was 
very early in the Jomon, contemporary with the latest Pleistocene Natufian in the 
Near East. By 11,000 years B.P., the Jomon people lived in settled villages, de¬ 
pended substantially upon plant foods, and used massive amounts of pottery. 
However, the Jomon domesticated no plants until rather late in the sequence. 
Seeds of weedy grasses are found throughout, but only in later phases (after about 
3,000 B.P.] do the first unambiguous domesticates occur, and these make up only a 
small portion of the seeds in archaeological contexts (Crawford, 1997]. Sophis¬ 
ticated agriculture came to Japan with imported rice from the mainland only 
about 2,500 B.P. Interestingly, acorns were a major item of Jomon subsistence. 
The people of California were another group of sedentary hunter-gatherers that 
depended heavily on acorns. However, in California the transition to high plant 
dependence began much later than in the Jomon (Wohlgemuth, 1996], Milling- 
stones for grinding small seeds became important after 4,500 B.P., although seeds 
were of relatively minor importance overall. After 2,800 B.P. acorns processed 
with mortars and pestles became an important subsistence component and small 
seeds faded in comparative importance. In the latest period, after 1,200 B.P. 
quantities of small seeds were increasingly added back into the subsistence mix 
alongside acorns in a plant-dominated diet. Other peoples with a late onset of 
intensification include the Australians. The totality of cases tells us that any stage of 
the intensification sequence can be stretched or compressed by several thousand 
years but reversals are rare (Harris, 1996; Price and Gebauer, 1995]. Farming did 
give way to hunting-and-gathering in the southern and eastern Great Basin of 
North America after a brief extension of farming into the region around 1,000 B.P. 
(Lindsay, 1986]. A similar reversal occurred in southern Sweden between 2,400 
and 1,800 B.P. (Zvelebil, 1996]. Horticultural Polynesian populations returned 
substantially to foraging for a few centuries while population densities built up on 
reaching the previously uninhabited archipelagos of Hawaii and New Zealand 
(Kirch, 1984]. Had intensification on plant resources been possible during the last 
glacial age, even the slowest Holocene rates of intensification were rapid enough to 
produce highly visible archaeological evidence on the 10 millennium timescale, one 
third or less time than Upper Paleolithic peoples lived under glacial climates. 

More Intensive Technologies Tend to Spread 

One successful and durable agricultural origin in the last glacial age on any 
sizeable land mass would have been sufficient to produce a highly visible ar¬ 
chaeological record, to judge from events in the Holocene epoch. Once well- 
established agricultural systems existed in the Holocene, they expanded at the 
expense of hunting-and-gathering neighbors at appreciable rates (Bellwood, 
1996]. Ammerman and Cavalli-Sforza (1984] summarize the movement of 
agriculture from the Near East to Europe, North Africa, and Asia. The spread 
into Europe is best documented. Agriculture reached the Atlantic seaboard 
about 6,000 B.P. or about 4,000 years after its origins in the Near East. The reg¬ 
ularity of the spread, and the degree to which it was largely a cultural diffusion 
process as opposed to a population dispersion as well, are matters of debate. 
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Cavalli-Sforza, Minozzi, and Piazza, (1994:296-299) argue that demic expan¬ 
sion by western Asians was an important process with the front of genes moving 
at about half the rate of agriculture. They imagine that pioneering agricultural 
populations moved into territories occupied by hunter-gatherers and inter¬ 
married with the preexisting population. The then-mixed population in turn 
sent agricultural pioneers still deeper into Europe. They also suppose that the 
rate of spread was fairly steady, though clearly frontiers between hunter-gath¬ 
erers and agriculturalists stabilized in some places (Denmark, Spain) for rela¬ 
tively prolonged periods. Zvelebil (1996) emphasizes the complexity and 
durability of frontiers between farmers and hunter-gatherers and the likelihood 
that in many places the diffusion of both genes and ideas about cultivation was a 
prolonged process of exchange across a comparatively stable ethnic and eco¬ 
nomic frontier. Further archaeological and paleogenetic investigations will no 
doubt gradually resolve these debates. Clearly, the spread process is at least 
somewhat heterogeneous. 

Other examples of the diffusion of agriculture are relatively well docu¬ 
mented. For example, maize domestication is dated to about 6,200 B.P. in 
Central Mexico, spreading to what is now the southwestern United States (New 
Mexico) by about 4,000 B.P. (Matson, 1999; Smith, 1995). In this case, the 
frontier of maize agriculture stabilized for a long time, only reaching the area 
now in eastern United States at the comparatively late date noted. Maize failed 
entirely to diffuse westward into the Mediterranean parts of California even 
though peoples growing it in the more arid parts of its range in the Southwest 
used irrigation techniques that have eventually worked in California with modest 
modifications to cope with dry-season irrigation. As with the origin process, the 
rate of spread of agriculture exhibits an interesting degree of variation. 


Changes in the Cultural Evolutionary System? 

A possible alternative to our hypothesis would be that a substantial moderni¬ 
zation of the cultural system occurred coincidently at the end of the Pleistocene 
epoch and that this resulted in a general acceleration of rates of cultural evo¬ 
lution, including subsistence intensification. The modernization of culture ca¬ 
pacities leading up to the Upper Paleolithic transition was presumably such an 
event, as were later inventions like literacy (Donald, 1991; Klein, 1999: ch.7). 
We are not aware of any proposals for major changes in the intrinsic rate of 
cultural evolution coincident with the Pleistocene-Holocene boundary. Students 
of the evolution of subsistence intensification and social complexity in the Ho¬ 
locene have suggested a series of plausible processes that will probably turn out 
to be at least part of the explanation for why the trend to intensification has 
taken such diverse forms in different regions (table 17.2). This list of diversifying 
and rate-limiting processes does not include any that should have operated more 
stringently on Upper Paleolithic, as opposed to Mesolithic and Neolithic, socie¬ 
ties, climate effects aside. Holocene rates of intensification do have the right time- 
scale to be drastically affected by millennial- and submillennial-scale variation 
that is rapid with respect to observed rates of cultural evolution in the Holocene. 
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Table 17.2. Processes that may retard the rate of cultural evolution and create local 
optima that halt evolution for prolonged periods 


Process 


Authors (examples) 


Geography: Eurasia, having the largest land 
mass, has more local populations to exchange 
innovations by diffusion, hence the fastest 
Holocene rate of subsistence intensification. 

Minor climate change: The late medieval onset 
of the Little Ice Age caused the extinction of 
the Greenland Norse colony. Agriculture 
at marginal altitudes in places like the 
Andes seems to respond to Holocene 
climatic fluctuation. 

Preadapted plants: The Mediterranean Old 
World is unusually well endowed with large- 
seeded grasses susceptible to domestication 
pressures. American domesticates, especially 
maize, may outcross too much to respond 
quickly to selection. 

Diseases: Density-dependent epidemic diseases 
may evolve that slow or stop the population 
growth, pending the evolution of resistance, 
that would otherwise drive the competitive 
ratchet. Local diseases that attack 
foreigners may protect otherwise- 
vulnerable systems. 

New technological complexes evolve slowly: 

Nutritional adequacy in plant-rich 
diets requires discovering 
cooking techniques, acquiring balancing 
domesticates, developing the potential of 
animal domesticates, and the like. 

New social institutions evolve slowly: Social 
institutions are generally deeply involved in 
subsistence but are also liable to be regulated 
by norms that make adaptive evolution 
away from current local optima difficult. 

Ideology may play a role: The evolution of 

fads, fashions, and belief systems may act to drive 
cultural evolution in nonutilitarian directions 
that sometimes carry them to new adaptive slopes. 


Diamond, 1997 


Kent, 1987; Kleivan, 1984 


Blunder, 1992; Blunder and 
Byrne, 1991; Diamond, 1997; 
Hillman and Davies, 1990 


Cavalli-Sforza, et al. 1994; 
Crosby, 1986; Gifford- 
Gonzalez, 2000; McNeill, 
1976 


Katz et al., 1974 


Bettinger, 1999; Bettinger and 
Baumhoff, 1982; North and 
Thomas, 1973; Richerson 
and Boyd, 1999 

Weber, 1930 


If climate variation did not limit intensification during the last glacial age to 
vanishingly slow rates compared to the Holocene epoch, the failure of intensive 
systems to evolve during the tens of millennia anatomically and culturally 
modern humans lived as sophisticated hunter-gatherers before the Holocene is a 
considerable mystery. 
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Conclusion 

The large, rapid change in environment at the Pleistocene-Holocene transition 
set off the trend of subsistence intensification of which modern industrial in¬ 
novations are just the latest examples. If our hypothesis is correct, the reduction 
in climate variability, increase in CO 2 content of the atmosphere, and increases 
in rainfall rather abruptly changed the earth from a regime where agriculture 
was impossible everywhere to one where it was possible in many places. Since 
groups that use efficient, plant-rich subsistence systems will normally outcompete 
groups that make less efficient use of land, the Holocene has been characterized 
by a persistent, but regionally highly variable, tendency toward subsistence in¬ 
tensification. The diversity of trajectories taken by the various regional human 
subpopulations since w 11,600 B.P. are natural experiments that will help us elu¬ 
cidate the factors that control the tempo of cultural evolution and that gener¬ 
ate historical contingency against the steady, convergent adaptive pressure 
toward ever more intense production systems. A long list of processes (table 
17.2) interacted to regulate the nearly unidirectional trajectory of subsis¬ 
tence intensification, population growth, and institutional change that the 
world’s societies have followed in the Holocene. Notably, even the slowest 
evolving regions generated quite appreciable and archaeologically visible inten¬ 
sification, demanding some explanation for why similar trajectories are absent in 
the Pleistocene. 

Those who are familiar with the Pleistocene epoch often remark that the 
Holocene is just the “present interglacial.” The return of climate variation on the 
scale that characterized the last glacial age is quite likely if current ideas about 
the Milankovich driving forces of the Pleistocene are correct. Sustaining agri¬ 
culture under conditions of much higher amplitude, high-frequency environ¬ 
mental variation than farmers currently cope with would be a considerable 
technical challenge. At the very best, lower CO 2 concentrations and lower av¬ 
erage precipitation suggest that world average agricultural output would fall 
considerably. 

Current anthropogenic global warming via greenhouse gases might at least 
temporarily prevent any return to glacial conditions. However, we understand 
the feedbacks regulating the climate system too poorly to have any confidence in 
such an effect. Current increases in CO 2 threaten to elevate world temperatures 
to levels that in past interglacials apparently triggered a large feedback effect 
producing a relatively rapid decline toward glacial conditions (Petit et ah, 1999). 
The Arctic Ocean ice pack is currently thinning very rapidly (Kerr, 1999). A 
dark, open Arctic Ocean would dramatically increase the summer heat income 
at high northern latitudes and have large, difficult-to-guess impacts on the 
Earth’s climate system. No one can yet estimate the risks we are taking of a rapid 
return to colder, drier, more variable environment with less CO 2 or evaluate 
exactly the threat such conditions imply for the continuation of agricultural 
production. Nevertheless, the intrinsic instability of the Pleistocene climate 
system, and the degree to which agriculture is likely dependent upon the Ho¬ 
locene stable period, should give one pause (Broecker, 1997). 
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APPENDIX 1: More Realistic Population Dynamics 

The logistic equation assumes that an increment to population density has the same 
effect on population pressure at low densities as at high densities. We know that this 
assumption is not correct in all cases. For example, hunters pursuing herd animals 
may generate much population pressure at low human population densities because 
killing only a small fraction of the herd makes the many survivors difficult to hunt. 
On the other hand, subsistence farmers spreading into a uniform fertile plain may 
feel little population pressure until all farmland is occupied. If returns to additional 
labor on shrinking farms then drop steeply, most population pressure will be felt at 
densities near K. To allow for such effects, ecologists often utilize a generalized lo¬ 
gistic equation: 


dK =r N\ 

dt 


1 -m 


UJ 


(Al.l) 


Population pressure is now given by the term ( N/K]°. If 9 > 1, population pressure 
does not increase until densities approach carrying capacity, as is usually the case for 
species like humans that have flexible behavior and considerable mobility, and thus 
can mitigate the effects of increasing population density over some range of densities. 
It seems intuitive that this would increase the length of time necessary to reach a 
given level of population pressure. However, this intuition is wrong. The generalized 
logistic can be used to derive a differential equation for n = ( N/K)°: 


dn = £(N)°dN 
dt NVKJ dt 

--‘(M'Mi'fl 

= 9n{\ — n] 


(A1.2) 


Thus, the differential equation for population pressure is always the ordinary logistic 
equation in which K = 1 and r is multiplied by 6. This means that when 6 > 1, it takes 
less time to reach a given amount of population pressure than would be the case if 
0=1. Reduced population pressure at low densities leads to more rapid initial pop¬ 
ulation growth. Population growth is close to exponential longer and this more than 
compensates for the fact that higher densities have to be reached to achieve the same 
level of population pressure. 


APPENDIX 2: The Dynamics of Innovation 


Consider a population of size N in which the per capita income of the population is 
given by: 


ynd 
I + N 


(A2.1) 


where y m is the maximum per capita income, and I is a variable that represents the 
productivity of subsistence technology. Thus, per capita income declines as popu¬ 
lation size increases, but for a given population size, greater productivity raises per 
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capita income. As in the previous models, we assume that as population pressure, 
now measured as falling per capita income, increases, population growth decreases. 
In particular, assume: 

^ = P N[y-y s ) CA2.2) 

where is the per capita income necessary for subsistence. If per capita incomes are 
above this value, population increases; if per capita income falls below y si population 
shrinks. If 7 is fixed, this equation is another generalization of the logistic equation. In 
an initially empty environment, population initially grows at a rate [>{y m — y s ), but 
then slows and reaches an equilibrium population size: 


f(y m -yj 

y* 


(A2.3) 


To allow for intensification we assume that people innovate whenever their per 
capita income falls below a threshold value y ; . Thus: 


d J t =al(y t -y) (A2.4) 

When per capita income is less than the threshold value y ; , people innovate, in¬ 
creasing the carrying capacity and therefore decreasing population pressure. When 
per capita income is greater than the threshold, they "de-innovate.” This may seem 
odd at first, but such abandonment of more efficient technology has been observed 
occasionally. The maximum rate at which innovation can occur is governed by the 
parameter a. 

If a small pioneer population enters an empty habitat, it experiences two distinct 
phases of expansion (figure 17.5). Initially, per capita income is near the maximum, 
and population grows at the maximum rate. As population density increases, per 
capita income drops below y i( and the population begins to innovate, eventually 
reaching a steady state value: 


py s + ay,- 
p + a 


(A2.5) 


The steady state per capita income is above the minimum for subsistence but below 
the threshold at which people experience population pressure and begin to innovate. 
At this steady state population growth continues at a constant rate, 


- = «(y. -yj 

p + a 

that is proportional to the rate of growth of subsistence efficiency. 


(A2.6) 


NOTES 

We thank Joe Andrew, Ofer Bar-Yosef, Richard Redding, Bruce Winterhalder, and 
three anonymous referees for unusually constructive criticism of the manuscript. 
Thanks to Scott Elias for insights pertaining to Pleistocene seasonality and to Peter 
Ditlevsen for providing figure 17.2. Peter Lindert’s invitation to give a seminar led to 
the first draft of this chapter. Thanks to Francisco Gil-White for assistance with the 
Spanish abstract in the original article. 

1. We define ‘'efficiency” as the productivity per unit area of land exploited for 
subsistence. Efficiency of subsistence is favored by strategies that move subsistence 
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down the food chain, especially to high-productivity plant resources. “Intensifica¬ 
tion” we define as the use of human labor to add productive lower-ranked resources 
to the diet or the use of technological innovations to increase the rank of more 
productive resources. Typically both strategies are employed simultaneously. Since 
increases in efficiency are achieved by either labor or technical intensification and 
since increases in efficiency usually also lead to population growth, we use the term 
“intensification” loosely for the interlinked processes of labor and technical inten¬ 
sification and population growth. We define “agriculture” as dependence upon do¬ 
mesticated crops and animals for subsistence. We mark the origin of agriculture as 
the first horizon in which plant remains having anatomical markers of domestication 
are found, or are likely on other grounds to be found in the future. Fully agricultural 
subsistence systems in the sense of a dominance of domesticated species in the diet 
typically postdate the origin of agriculture by a millennium or more. 

2. It has also been argued that Pleistocene climates were less seasonally variable 
than during the Holocene, but this idea has scant empirical support (Miracle and 
O’Brien, 1998). Elias (1999) has used fossil beetle faunas to estimate July and Jan¬ 
uary temperatures in Holocene and Pleistocene deposits. These data suggest that the 
Pleistocene was more seasonal than the Holocene. However, beetle estimates of 
January temperatures are not very reliable because beetles in temperate and arctic 
climates overwinter in a dormant state so that their distributions are rather insensi¬ 
tive to winter as opposed to summer temperatures. Plant distributions are similarly 
affected. No current method of estimating winter temperatures in the Pleistocene is 
reliable. 

3. Agronomists and climatologists have recently become interested in the im¬ 
pacts of climate change and climate variability in the context of C02-indcued global 
warming (Bazzaz and Sombroek, 1996; Downing, Olsthoorn, and Tal, 1999; Kane 
and Yohe, 2000; Reilly and Schimmelpfennig, 2000; Rosensweig and Hillel, 1998; 
Schneider, Easterling, and Mearns, 2000). Global climate models suggest that global 
warming may increase short timescale climate variation as well as creating a steep 
trend. To some degree, these conditions mimic the millennial and submillennial scale 
variations in the Pleistocene, and, as crop-and-weather models and empirical data 
improve, more definitive assessments of impact of last glacial conditions on plant- 
based subsistence strategies will become possible. 

4. By “adaptive,” we mean behaviors that, by comparison with available al¬ 
ternatives, have the largest population mean fitness. 

5. Some human populations might have curtailed birth rates in order to pre¬ 
serve higher incomes at any given level of intensification. In a sense, such populations 
have just redefined K to be a lower value that permits higher incomes by employing 
what Malthus called the “preventative checks” on population growth. The rest of the 
analysis then applies with K measured in suitably emic terms. Cultural differences in 
the value of intensification threshold or K (Coale, 1986) will make evidence of stress 
more likely in populations where the effective carrying capacity is closer to the 
ultimate subsistence carrying capacity than in populations that reduce growth rates 
by preventative checks that keep population well below absolute subsistence limits. 
The perceived costs of population control, given that the main mechanism in non¬ 
modern societies was infanticide and sexual abstinence, may mean that most popu¬ 
lations intensified labor inputs at any given level of technology efficiency to near 
subsistence limits (Hayden, 1981). In either event, population pressure will tend to 
stay constant to the extent that rates of population growth and intensification are suc¬ 
cessful in adjusting subsistence to current conditions. Normally population growth 
and decline are quite rapid processes relative to rates of innovation and will keep 
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average population size quite close to K. Short-term departures from K caused by 
short-term environmental shocks and windfalls should be the commonest reasons to 
see especially stressed or unstressed populations. If the rate of innovation is more 
rapid than exponential population growth for any significant time period, then per 
capita incomes can rise under a regime of very rapid population growth, as in the last 
few centuries. This regime, if it had occurred in the past, should be quite visible in 
the historical and archaeological record because it so rapidly leads to large popula¬ 
tions and large-scale creation of durable artifacts. Alternatively, population growth 
may have been limited in past populations by the analog of the modern demographic 
transition. Thus, hunter-gatherers might have resisted the adoption of plant-based 
intensification because they viewed the life style associated with plant collecting or 
planting as a decrement to their incomes. However, resisting intensifications that 
increase human densities makes such groups vulnerable to competitive displacement 
by the intensifiers unless the greater wealth of the population limiters allows them to 
successfully defend their resource-rich territories. On the evidence of the fairly rapid 
rate of spread of intensified strategies once invented, such defense is seldom suc¬ 
cessful (e.g., Ammerman and Cavalli-Sforza, 1984; Bettinger and Baumhoff, 1982). 

6 . The dates in table 17.1 reflect considerable recent revision stemming from 
accelerator mass spectrometry 14 C dating, which permits the use of very small carbon 
samples and can be applied directly to carbonized seeds and other plant parts 
showing morphological changes associated with domestication. Isolated seeds tend to 
work their way deep into archaeological deposits, and dates based on associated large 
carbon samples (usually charcoal) often gave anomalously early dates. 
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PART 5 

Links to Other Disciplines 


Biology is an immense enterprise whose purview ranges from the 
physics of enzyme catalysis to the role of gene expression in cell differentiation 
to the evolutionary origins of flight to the global carbon cycle. Nonetheless, 
biology is a single discipline that is taught as a coherent, integrated subject to 
first-year university students. By contrast, each social science has its own in¬ 
dependent introductory course, one that usually makes little reference to 
other disciplines. The rigid division of human sciences into disciplines has 
always seemed quite odd to us. In the great scheme of things, humans surely 
present a smaller range of phenomena than all the rest of biology. One reason 
why biology remains a unified discipline is that the science has a small set 
of unifying problems at its core. Physics and chemistry underpin everything. 
Genetics, cell metabolism, ecology, and evolution are relevant to all organ¬ 
isms, and physiology is common to all multicellular life. A good basic biology 
course will show how these integrating subdisciplines relate to one another. 
Practicing biologists often discover that they need to know something of each 
of these integrating subjects in their professional careers. How can the human 
sciences possibly be very different? 

We have no clear idea of why the human sciences have evolved so dif¬ 
ferently from biology. Our mentor Donald T. Campbell took an interest in 
such matters (Campbell, 1969) and supposed that the social sciences would 
become much more interdisciplinary than they in fact have. In this part, we 
argue that evolutionary theory, specifically the theory of cultural evolution, 
stands ready to play much the same role that organic evolution does in biology. 
The basic argument is very simple. What is the most dramatic feature of 
human life? Certainly one candidate is its dramatic variation in time and space. 



376 LINKS TO OTHER DISCIPLINES 


No other species changes its behavior so rapidly, and none occupies such a 
wide range of environments using such a wide range of economic strategies. 
Evolutionary processes produce this diversity; every culture has descended 
from some immediate ancestor, ultimately tracing back to a common African 
ancestor. Every discipline in the human sciences is centrally concerned with 
cultural evolution and cultural diversity, whether called by these names or not. 
Anthropologists have made the study of cultural diversity their specialty. 
Historians study cultural change in all its forms. For economists, the evolution 
(or growth) of economies is a central theme. Political scientists study opinion, 
policy, and constitutional change; sociologists, institutional change. Cultural 
evolutionists have something to say about some central topics, such as the 
explanation of human cooperation and social institutions (parts 2 and 3 
contain examples). Should human scientists care to emphasize unifying 
problems, cultural evolution can share a portion of the burden. 

Chapter 18 shows economists how a theory of cultural evolution quite 
naturally complements the rational choice theory that is basic to their disci¬ 
pline. Rational choice theory is one of the other candidates to be a major 
unifying element in the human sciences. Yet rational choice theory famously 
lacks psychological realism (Simon, 1959) and lacks an explicit temporal 
dimension (Nelson and Winter, 1982). Here we derive the basic Darwinian 
theory of cultural evolution from Bayesian assumptions applicable to the 
standard rational actor. The behaviors of others are merely a form of proxy 
information about the world, a resource to be tapped in deciding how to be¬ 
have oneself. In a world where gaining information tends to be costly, imitating 
what others do is an excellent strategy under a wide variety of circumstances. 
An inheritance system provides time-tested information. Using your parents’ 
beliefs or those of others as Bayesian priors is highly adaptive. Doing so allows 
an individual to concentrate scarce resources on updating decent priors rather 
than on starting with less information-rich priors, such as those furnished by 
a generic human nature. Adding these bits of psychological realism yields a 
theory of cultural evolution within which boundedly rational actors play a 
fundamental role. The theory also accounts for important human oddities such 
as our extraordinary cooperation and our susceptibility to certain types of 
maladaptations. Neat as one of Adam Smith’s pins we thought, and still think, 
though the manuscript was rejected by the American Economic Review after 
protracted adventures with editors and reviewers. We suspect the baleful in¬ 
fluence of the lack of a synthetic first-year course is at work here. Subjects not 
legitimated in that course, which purports to encompass all someone needs to 
know, are deeply suspect, and culture is generally absent from Econ 1. At the 
same time, an economic anthropologist who taught us a lot about the science 
of culture knew little of what is taught in Econ 1. 

Chapter 19 is directed at those in the social sciences unfamiliar with a 
style of deploying mathematical models that is second nature to economists, 
evolutionary biologists, engineers, and others. Much science in many dis¬ 
ciplines consists of a toolkit of very simple mathematical models. To many not 
familiar with the subtle art of the simple model, such formal exercises have 
two seemingly deadly flaws. First, they are not easy to follow. The modern 
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style of mathematical analysis uses a very compact notation that facilitates 
algebra but is quite hard to read. Even the initiated reader might take days to 
deeply understand even a rather elementary model. The untrained are nearly 
helpless. Second, motivation to follow the math is often wanting because the 
model is so cartoonishly simple relative to the real world being analyzed. 
Critics often level the charge “reductionism” with what they take to be 
devastating effect. The modeler’s reply is that these two criticisms actually 
point in opposite directions and sum to nothing. True, the model is quite 
simple relative to reality, but even so, the analysis is difficult. The real lesson 
is that complex phenomena like culture require a humble approach. 

We have to bite off tiny bits of reality to analyze and build up a more global 
knowledge step by patient step. Experimentalists know the same lesson. To 
achieve virtues of experimental control of variables, you have to examine only 
one or a few variables at a time. Similarly, observational studies must examine 
a relatively few dimensions if any explanatory power is to result. Simple 
models, simple experiments, and simple observational programs are the best 
the human mind can do in the face of the awesome complexity of nature. The 
alternatives to simple models are either complex models or verbal descriptions 
and analysis. Complex models are sometimes useful for their predictive 
power, but they have the vice of being difficult or impossible to understand. 
The heuristic value of simple models in schooling our intuition about natural 
processes is exceedingly important, even when their predictive power is lim¬ 
ited. (The predictive power of complex models is no better; they often sac¬ 
rifice much transparency for little improvement in predictive power.) Verbal 
reasoning is exceedingly important because the human mind seems to be a 
verbal organ. However, words alone can be snares and delusions. Unaided 
verbal reasoning can be unreliable—words are polysemic, and the phenomena 
of the world have quantitative dimensions poorly captured by the qualita¬ 
tive concepts of natural language. The lesson, we think, is that all serious 
students of human behavior need to know enough math to at least appreciate 
the contributions simple mathematical models make to the understanding 
of complex phenomena. The idea that social scientists need less math than 
biologists or other natural scientists is completely mistaken. 

Chapter 20 deals with the vexatious concept of memes. On the one hand, 
we have great sympathy with the views of the “universal” Darwinists like 
Daniel Dennett, Robert Aunger, and Susan Blackmore, who, following 
Richard Dawkins, employ the term to stress the analogies between genes and 
culture. On the other hand, we have several worries. One is academic 
punctilio. When Dawkins (1976) coined the term meme, he quite frankly 
admitted that he had done no scholarship in the social sciences. Fair enough in 
the context of a trade book, but, in fact, another pioneering universal 
Darwinist, Donald Campbell (1965, 1975), had done significant work on 
cultural evolution by 1976. Lucca Cavalli-Sforza and Marc Feldman (1973) 
had already published their pioneering formal models of cultural evolution. 
The second, more substantive problem is that the analogy between genes and 
culture is not very deep. The two are similar in that important information is 
transmitted between individuals. Both systems create patterns of heritable 
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variation, which in turn implies that the population-level properties of both 
systems are important. Population-level properties require broadly Darwinian 
methods for analysis. But this just about exhausts the similarities. The list of 
differences is much larger. Culture is not based on direct replication but upon 
teaching and imitation. The transmission of culture is temporally extended. It 
is not necessarily particulate. Psychological processes have a direct impact on 
what is transmitted and remembered. These psychological effects can produce 
complex adaptations in the absence of natural selection. Users of the meme 
concept seem to us to believe that it does more work than it really does. Third, 
most users of the meme concept follow Dawkins in being rather incurious 
about the existing scholarship on the 

nature of cultural transmission. A large amount of data already exists on how 
culture works as an inheritance system and as an evolutionary system. 
Linguists are perhaps the most advanced students of memes (e.g., Bloom, 
2000). Building upon such existing scholarship is surely the most effective way 
to make progress. Other domains of culture—social organization, technology, 
folk science—may be governed by rather different principles. The job of 
synthesizing what we already know and drawing lessons for future work is left 
undone to the extent that we think that the analogy with genes is a sufficient 
foundation for a science of culture. It isn’t. 

We believe that the Darwinian theory of cultural evolution will make 
contributions across the broad sweep of problems in the human sciences, but 
the project is one of introducing additional useful tools and unifying concepts 
rather than an imperial ambition to replace great swaths of existing theory or 
methods. 
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l8 Rationality, Imitation, 
and Tradition 


When the quality of information is poor, people often rely on 
tradition in making economic decisions. What is the best retail markup per¬ 
centage? When should one refinance one’s home? What is the right safety factor 
in designing a building? Retailers, homeowners, and engineers typically make 
such decisions using traditionally acquired rules-of-thumb. This tactic has both 
advantages and disadvantages. It can be useful because solving problems from 
scratch is difficult and costly. On the other hand, the uncritical adoption of 
traditional solutions to problems can lead people to acquire outmoded or even 
completely unfounded beliefs. Peasants sometimes resist beneficial innovations 
proffered by development agencies and retain traditional agricultural practices; 
many contemporary Americans maintain the unfounded belief that there are 
innate differences between the members of different ethnic groups. 

The fact that tradition is sometimes reliable and other times misleading 
creates an interesting problem for economists. Traditions often work; when they 
do, they are useful because they reduce the costs of acquiring information and 
lower the possibility of making errors. However, if everyone were to depend 
exclusively on traditional rules, what would cause traditional rules to be modi¬ 
fied in response to changes in the environment, and what would initially cause 
useful and reliable behaviors to become traditions? 

Conventional economic theory is not helpful in answering this question 
(Conlisk, 1980). Economists have adopted the Bayesian theory of rational choice 
as the natural extension of the utility-maximizing view of human behavior when 
there is uncertainty and use it as a positive theory to predict people’s behavior 
in a wide variety of contexts (Hirshleifer and Riley, 1978). Within the context of 
this theory, a person’s beliefs about the world are represented as a subjective 
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probability distribution. Once this distribution is specified, the theory tells us 
how rational people should behave and how they should modify their beliefs in 
accord with their experience. The theory does not tell us why people initially 
come to have the beliefs that they do but simply takes them as given. 

The role of traditional knowledge has been discussed by some economists, 
but the processes that lead to sensible traditions seem to have been largely 
ignored. Hayek (1978) believes that limited knowledge and cognitive abilities 
force people to rely on traditional beliefs and values and argues that traditions are 
sensible because groups with favorable traditions survive longer and attract more 
members. Proponents of evolutionary models of firms (Alchian, 1950; Nelson 
and Winter, 1982) assume that beliefs, values, and other determinants of firm 
behavior are transmitted within firms and that these beliefs are shaped by the 
natural selection of firms. The only formal theoretical treatment of tradition 
seems to be the interesting article of Conlisk (1980) in which the individuals 
who optimize compete with individuals who acquire their behavior by imitation. 
If optimization is costly, Conlisk shows that imitation can persist in the popu¬ 
lation. 

In this chapter, we introduce tradition into conventional theory by assuming 
that people acquire their initial subjective probabilities by imitating their par¬ 
ents, relatives, teachers, business associates, and friends, but otherwise behave as 
classical Bayesian rationalists. Several lines of empirical evidence support the 
assumption that people acquire their beliefs about the world by imitation and 
similar processes. Psychologists have shown that children readily acquire be¬ 
havioral traits from moral beliefs to rules of grammar by imitating adult models 
(Bandura, 1977; Rosenthal and Zimmerman, 1978). Data collected on familial 
resemblances show high parent-offspring correlations for a wide variety of cog¬ 
nitive traits (I.Q.; Scarr and Weinberg, 1976), behaviors (child abuse, alco¬ 
holism; Smith, 1975), and indicators of beliefs (religious and political-party 
affiliation; Fuller and Thompson, 1960). A wealth of anthropological data sug¬ 
gests that human groups possess considerable cultural inertia; members of 
groups with different cultural histories behave quite differently even when living 
in similar environments (e.g., Edgerton, 1971). There is also evidence that in¬ 
dividuals acquire new beliefs by imitation when they enter organizations such as 
business firms (Van Maanen and Schein, 1979) and that this process causes 
distinct cultures to develop in different organizations. (This body of evidence is 
reviewed in more detail in Boyd and Richerson, 1985:38-60.) 

The assumption that people acquire their beliefs by imitation leads to 
models that keep track of the processes that change the frequency of alternative 
beliefs in a population of decision makers. To understand why a particular 
person acquires a particular set of beliefs, we must know to what kinds of be¬ 
havior naive individuals are exposed. This in turn will depend on the distribution 
of beliefs (and thus behaviors) that exist in the population. A person in a village 
in which many people have adopted modern farming practices is more likely to 
acquire the beliefs that underlie such practices than a person exposed only 
to traditional lifeways. To predict the distribution of beliefs in the population at 
some future time, we must know the present distribution of beliefs and account 
for all of the processes that change that distribution through time. Here we 
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present several such models of cultural change. For a more extensive exposition 
of our views, see Boyd and Richerson (1985), and for related work, see Pulliam 
and Dunford (1980), Cavalli-Sforza and Feldman (1981), Lumsden and Wilson 
(1981), and Rogers (1989). 

These models are different from Conlisk’s in two important ways: (1) 
Conlisk regards imitation as an alternative to optimization; individuals are either 
imitators or optimizers. We assume that imitation is a precondition for opti¬ 
mization; everyone must acquire beliefs about the world before they can opti¬ 
mize. (2) Conlisk simply posits dynamical relations between variables that 
describe a whole population of decision makers; we are more concerned to show 
how the details of individual imitation and decision-making processes lead to the 
dynamics of the distribution of beliefs in a population through time. As we shall 
see, the optimal behavior in these models is usually for individuals to mix imi¬ 
tation and individual decision making, depending on how the temporal dynamics 
work out. 

We think that there are three lessons to be drawn from our theory of tra¬ 
ditions: first, there are plausible circumstances in which it is optimal to depend 
nearly completely on tradition at equilibrium. Second, there are plausible ge¬ 
netic and cultural mechanisms that could cause people to achieve this equilib¬ 
rium. Third, when people do depend largely on tradition, processes other than 
individual choice may have important effects on why people behave the way 
they do. We will begin by modeling a reference case in which people acquire 
their initial subjective probabilities by imitation and then modify them in ac¬ 
cordance with their own experience in a uniform and constant environment. 
This model indicates that when beliefs are transmitted culturally, greater reli¬ 
ance on tradition always leads to higher expected utility. We will then add 
environmental variability to the model. When the optimal behavior varies be¬ 
cause individuals encounter different environments, there is an optimal level of 
dependence on tradition. If there is a substantial chance that individuals and the 
people that they imitate experience the same environment, and if the infor¬ 
mation available to update priors is poor, it can be an evolutionary equilibrium 
to rely almost completely on tradition. In the simplest model, a population of 
such individuals will, on the average, behave almost as if they were perfect- 
information optimizers. ITowever, in such a population other processes, which 
can lead to both beneficial (but poorly understood) beliefs or deleterious su¬ 
perstitions, may also be important. Finally, we will argue that there are cultural 
processes that may cause people to be characterized by an optimal reliance on 
tradition. 


The Basic Model 

In the first and simplest model there are only two processes that affect the dis¬ 
tribution of beliefs in a population of decision makers. First, individuals use 
available information to update their subjective probability distributions. Second, 
the frequency of different beliefs is changed by the transmission of these beliefs 
to another generation. The model has three parts: a description of how single 
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individuals modify their beliefs in light of their experience (a process we refer to 
as “individual learning”), a consideration of how individual learning affects the 
distribution of beliefs in a population of individuals, and a mechanism for passing 
one generation’s beliefs to the next. 

Consider the following very simple decision problem. An individual decision 
maker has the following utility function: 

u[y,z)= -u 0 [z-y) 2 (1) 

where z is a decision variable under his control, y is a variable that represents the 
state of the world, and uq is a constant. While the quadratic form of this utility 
function is unconventional in the theory of the consumer, it is a mathematically 
convenient representation of the usual view of individual choice. To see this, 
consider the following example: suppose that the decision maker is a young 
professional just beginning his or her career and that z represents the amount of 
time devoted to career advancement. The remainder of the young professional’s 
time, t, is devoted to family and recreation. Then t and z are arguments of 
a personal “production function,” which gives amounts of various “commodi¬ 
ties,” for example, income and marital happiness, produced for each combina¬ 
tion of t and z. The consumption of these commodities in turn generates utility. 
By using the constraint that total time is fixed and assuming that the young 
professional’s personal production and utility functions have the appropriate 
convexity properties, one could derive a unimodal function giving utility as a 
function of z. The optimum value of this function, y, would depend on the prop¬ 
erties of the personal production function, which in turn will depend on the 
state of the world. For example, the relationship between time devoted to work 
and income might depend on what kind of firm the young professional has 
entered. While the utility function so derived is unlikely to be exactly quadratic, 
this functional form is a reasonable caricature of a more general unimodal 
function. In fact, one could think of it as the first two terms of a Taylor’s series 
expansion of an arbitrary utility function in the neighborhood of the optimum. 
Because we have not specified how commodities map onto utilities, this model 
can represent any degree of risk preference. 

The individual does not know the value of y with certainty, but his or her 
beliefs about the likelihood that y takes on various values conform to a normal 
probability distribution with meany and variance L. Note thaty is not a random 
variable; in a given environment there is an optimum amount of time devoted to 
career. The probability distribution describes the decision maker’s subjective 
beliefs about what value of z is optimum. 

Before making his or her choice, the decision maker has the opportunity to 
review a certain amount of evidence about the state of the world. For example, 
by observing the effects of time devoted to work on career advancement and 
home life, the young professional could get an estimate of the optimal amount 
of time to devote to work. Because our young professional’s initial rate of ad¬ 
vancement and domestic satisfaction might depend on a variety of factors other 
than the amount of time devoted to work, this estimate will be imperfect. 
Suppose that this evidence can be quantified by the variable x. The decision 
maker believes (correctly) that the value of % is normally distributed with mean y 
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and variance V e . After using this evidence and Bayes’s law, the decision maker’s 
updated subjective probability distribution is normal with mean y' where 


j _ V e y + Lx 
V e + L 


( 2 ) 


To simplify the development here, assume that the decision maker does not 
update the variance of his or her subjective probability distribution. 

The decision maker uses the updated distribution to calculate his or her 
expected utility as a function of z: 


£{w(z,y)|y',x} = -w 0 [(z-y') 2 + L ] (3] 

and, thus, the value of z that maximizes his or her expected utility, z* is the 
following: 

z * —y’ (4) 

That is, the optimal behavior is the individual’s posterior estimate of the most 
likely state of the environment. 

Now, suppose that there is a large population of decision makers. The in¬ 
dividuals who make up this population differ in only two respects: (1) they have 
different prior beliefs about the most likely state of the world, and (2] they are 
exposed to different evidence about the state of the world. To formalize the first 
assumption, we assume that the frequency distribution of y in the population 
before the subjective probability distributions have been updated, Q,(y), is 
normal with mean M t and variance B,. Notice that this is a description of the 
population, not a probability density. To formalize the second assumption, we as¬ 
sume that the value of x experienced by each different individual is an inde¬ 
pendent random variable with the density p[y), which has a mean equal to the 
true state of the world, y, and variance V e . Otherwise, all individuals are iden¬ 
tical; in particular, they all have the same utility function and their subjective 
probability distribution is characterized by the same value of L. 

Let us now consider how the use of Bayes's law by individuals to modify 
their beliefs changes the frequency distribution of y in the population. The 
distribution of y in the population of decision makers after updating, Q' ( , is as 
follows: 


Q' f 60 = jj h{y\y ,x]Q l {y)p{x]dydx 


(5) 


where h(y\y,x) is the conditional density of an individual’s belief after updating, 
given that the individual had beliefs characterized by y/ before updating and 
observed x. Then QJ(y) is normal with this mean: 


and variance: 


M' = 


M t V e + yL 
V e + L 


( 6 ) 


B t V} + V e L 2 
[Ve + Lf 


( 7 ) 
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Thus, after updating, the mean value of y moves closer to the correct value, y; 
the variance may either increase or decrease depending on the magnitudes of B t , 
V e , and L. 

So far, we have followed the usual practice of taking the decision maker’s 
initial subjective probabilities as given. We are now in a position to consider the 
effect of the transmission of these beliefs to another “generation” of decision 
makers by imitation. For example, suppose that the young professionals advance 
in their firm and are eventually replaced by a new cohort of entry-level profes¬ 
sionals, who form a new population of decision makers and face the same de¬ 
cision problem that their predecessors faced. Initially the individuals in this 
second “generation” are naive; they have no beliefs of any kind about how much 
time should be devoted to work. However, each naive individual has been able 
to observe n models of behavior of the previous generation of professionals. 
Based on the behavior of their models, naive individuals are able to infer what 
each model believes about how much time should be devoted to one’s profes¬ 
sion. Then each of the naive individuals adopts the mean of the n inferred values 
of y that characterize their models as the mean of their own subjective proba¬ 
bility distributions. We assume that the variance, L, remains constant at the 
same value as in the previous generation. 

With these assumptions the distribution of y in the population just before 
updating in generation, t + 1, Qt+i(y), is normal with mean, M t+ 1 =Mf, and 
variance, B t+ 1 = (1 /n)B' t . Because the distribution ofy remains normal, the state 
of the population of decision makers at any time can be specified by the mean and 
variance of y. If the environment remains constant, the values of the mean and 
variance in the population will eventually reach a unique stable equilibrium, M 
and B, where 

M=y (8) 

and 


B = 


KL 2 


n(V e + Lf - V} 


(9) 


Equations (6] and (8) say that the effect of the repeated application of Bayesian 
inference and accurate imitation on the mean value of y is unambiguous: the 
average of the best guesses about the state of the environment in the population 
converges monotonically to the actual state of the environment. According to (7] 
and (9], however, the variance of y is affected by competing processes. New 
variation is introduced each generation by errors in individual learning; this 
process acts to increase B. On average, however, inference causes beliefs about 
the environment to become more accurate, and this decreases B. Finally, if n > 1, 
imitation itself acts to decrease the variance of B in the population. 


The Evolutionary Stable Amount of Tradition 

The relative importance of tradition and individual learning is determined by the 
relative magnitudes of the width of each individual’s initial prior probability 
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distribution ( L ) and the quality of the information available to individuals ( V e ). 
If L is small compared to V e , young professionals’ work habits will be mostly 
determined by the beliefs that they acquire by imitation. If L is large, the in¬ 
formation that individuals gather for themselves will be more important. 

In this section we determine the evolutionary stable, or ESS, value of L. To 
do this, we find the value of L that when common in a population has higher 
expected utility than slightly different values of L. One way to justify the ESS 
approach is to assume that L is a genetically variable character and that utility is 
monotonically related to fitness. The ESS value of L is the value that prevents the 
rare genotypes from invading under the influence of natural selection. Some 
models of cultural transmission have very similar properties to genetic ones, and 
for our immediate purposes, we can think of L as evolving under the influence of 
either process. Clearly, cultural and genetic transmission also differ in important 
ways, for example, in the timescale over which they are relevant. Variations in 
reliance on tradition among contemporary societies likely require a cultural 
explanation, while a genetic model would be appropriate for studying the evo¬ 
lution of humans from apes. The penultimate section of the chapter will address 
several explicitly cultural mechanisms that can lead to the ESS. 

Consider a population in which most individuals have a learning rule 
characterized by the parameter value, L, and that has reached the associated 
equilibrium values M and B. The expected utility of an individual whose learning 
rule is characterized by parameter L' is the following: 


V- J 2 V 

E{u(y,x)} = - U 0 - 5 — 7 [(y = M) 2 + B]+ -(10) 

(V e + L) 2 (V e + L] 2 

One can show that this expression for expected utility is concave with a global 
maximum at the value of L, L 1 , 


L f = (y-Mf+B (11) 

The term (y — M) 2 + B measures the closeness of the population’s beliefs about 
the state of the world to its actual state; V e measures the accuracy of the in¬ 
formation gained by each individual through his own experience. Relation (11) 
(together with [1]) says individuals should rely on imitation in proportion to the 
accuracy of the distribution of beliefs. If (y — M) 2 + B is large compared to V e , 
individuals should rely mainly on their own experience; if (y — M) 2 + B is small 
compared to V e , then it is optimal to depend mainly on imitation. This ex¬ 
pression does not depend on the assumption that the population is in equilib¬ 
rium nor that the environment is constant. 

Now, suppose that natural selection, or an analogous cultural process, favors 
L, which increases expected utility. Then because B is a function of L, the pop¬ 
ulation will eventually reach an ESS value of L, L*, such that L* = B(L*]. Using 
the expression for B given in equation (9), one can show that the ESS amount of 
imitation is L* = 0. At equilibrium, individuals will depend completely on tra¬ 
dition and totally disregard the evidence presented by one’s own experience. 

This result has an intuitive explanation. At equilibrium, the relative merit of 
tradition and learning depends on the relative “noisiness” of the two sources of 
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information. Learning has two effects on the variance in the population. On 
average, learning causes individual’s estimates of y to move toward the correct 
value and thus acts to reduce the variation in the population. However, errors 
made during learning increase the variation of the population. Once the popu¬ 
lation reaches equilibrium in a constant environment, the net effect of learning is 
to maintain erroneous beliefs in the population. Decreasing L always decreases 
B. Thus, any process that acts to change L so as to increase expected utility will 
reduce L until experience plays no role in determining individual beliefs. 


Heterogeneous Environments 

There are good reasons to doubt the robustness of the conclusion of the previous 
section. So far, we have assumed that (1) every member of the population 
experienced the same state of the world, (2) the state of the world did not vary 
from generation to generation, and (3) all individuals had the same utility 
function. Relaxing any one of these assumptions reduces the usefulness of tra¬ 
dition. For example, consider a heterogeneous environment in which different 
individuals experience different states of the world, but in which there is some 
chance that individuals in one environment draw models from other environ¬ 
ments. In a given environment, people’s beliefs will tend toward the optimum 
in that environment, but drawing models from diverse environments will reduce 
the likelihood that an individual acquires beliefs that are appropriate to its 
own environment. The models in this section show that a substantial reliance on 
tradition may still be evolutionarily stable in a heterogeneous environment or in 
a population in which utility functions vary. We have shown elsewhere that this 
conclusion also holds true in an environment that changes through time (Boyd 
and Richerson, 1983, 1985: ch. 4). 

The essential feature of a heterogeneous environment is that different in¬ 
dividuals in the population experience different states of the world, formalized 
in terms of the value of y. Such variation might arise for many reasons. For 
example, different young professionals might work in different firms, practice 
different professions, or live in different regions. We will model heterogeneous 
environments by assuming that the probability that an individual in the popu¬ 
lation experiences the environment specified by the value y is given by a normal 
density function, f(y), with mean 0 and variance H. Setting the mean to 0 can be 
done without loss of generality since it sets only the origin from which different 
environments are measured. The variance, H, is a measure of the amount of 
environmental variation. 

Suppose that in the environment characterized by the value y, the frequency 
of individuals with a subjective probability distribution characterized by a mean 
y before updating is normal with mean M,(y) and variance B,(y]. Then the mean 
and variance after updating in that environment are given by equations (6) and 
(7) with the appropriate value of y. Further, suppose that there is a probability 
1 — m that given models experience the same environment that their naive 
imitators will experience and a probability, m, that models are drawn at random 
from the population as a whole. Thus, for example, some of a particular young 
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professional’s models might be drawn from another firm in which more (or less) 
dedication is required to succeed. This model also applies to a population of 
individuals who live in a uniform environment but whose utility functions have 
different optima. 

With these assumptions, one can derive recursions for the mean and vari¬ 
ance of the distribution of prior beliefs in each environment. One can show that 
the equilibrium mean in habitat y is shown here: 


M(y) = 


(1 — m)yL 
mV e + L 


( 12 ) 


Equation (12) says that in a heterogeneous environment on average individuals 
have incorrect beliefs about their environment. The mean value of y in any 
environment y results from the balance of two forces. The Bayesian learning 
process tends to move the mean toward the correct value for that environment, 
but the exposure to models drawn from other environments moves the mean 
toward the mean for the entire population, 0. To find the equilibrium variance, 
we proceed exactly as in the previous section. 

By averaging the expressions for the equilibrium mean and variance over all 
habitats, and using the expression for the ESS value of L given by equation (11), 
one can calculate L* in a heterogeneous environment. The results of this cal¬ 
culation are shown in figure 18.1, which plots the relative importance of imi¬ 
tation in determining behavior, V e /[L* + V e ), as a function of V e for several 



Figure 1 8 . 1 . Plot of the fractional importance of tradition in determining behavior 
when the propensity to rely on tradition is at its equilibrium value, V e /[L* + V e ), 
as a function of the quality of information available to individuals (V e ) assuming a 
heterogeneous environment, n = 1 and H= 1.0. Increasing values of m represent 
increasing amounts of mixing of models among different environments. 
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values of m. This figure indicates that the equilibrium optimum amount of 
imitation increases as the quality of the information available through individual 
experience declines and as the probability that models are drawn from foreign 
environments decreases. 

These results make sense. The amount of imitation favored by evolutionary 
processes depends on the relative quality of two sources of information, the 
information available to individuals through their own experience and through 
observing the behavior of their models. As V e increases, the quality of the in¬ 
formation available to individuals through experience declines. As m decreases, 
the probability that an individual’s models will exhibit behavior that is appro¬ 
priate in the local environment increases. Thus, both increasing V e and de¬ 
creasing m cause the equilibrium value of L to increase. 

These results suggest that the conclusions of the first section are not entirely 
misleading. When the amount of mixing between environments is not too large 
and information is of low quality, individuals achieve the highest expected utility 
by relying mainly on tradition. We think that this combination of circumstances is 
not uncommon. The world is complicated and poorly understood and the effects 
of many decisions are experienced over the course of a lifetime. In deciding how 
much time to devote to their families, young professionals must estimate not only 
the immediate effect on their careers and homelives but also the long-run effects 
on the development of their children’s adolescent behavior. In such cases the 
information available to individuals may be very poor indeed, and it is plausible 
that they are best off relying almost entirely on traditional beliefs. Also notice that 
figure 18.1 is a worst case for tradition because it assumes that there is only one 
model (n= 1). As n increases, the equilibrium variance within environments 
decreases, and, therefore, tradition is relatively more reliable. 

It is important to note that even when the amount of individual learning is 
small, it plays an important role in the evolutionary dynamics of the population. 
Some individual learning is necessary if traditional beliefs are to remain utilitarian 
in local environments in the face of imitation of experienced individuals from 
other environments. However, a relatively small amount of individual learning is 
sufficient to keep traditional behaviors on average reasonably near utilitarian 
optima, so long as mixing between heterogeneous environments is not too great. 


Biased Imitation 

To this point, we have assumed that individuals adopt a simple unbiased average 
of the beliefs of the models to which they are exposed. This may not be the most 
sensible procedure. It would seem better to preferentially imitate models whose 
behavior has been successful. Young professionals might imitate models who 
are particularly accomplished in their work and content in their private lives. 
More generally, naive individuals may imitate prosperous models, contented 
models, prestigious models, or devout models. By doing this, naive individuals 
will be more likely to acquire beliefs that lead to prosperity, devotion, content¬ 
ment, or prestige. In this section we show how this form of biased cultural 
transmission can increase the frequency of correct beliefs in a population, even 
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when individuals do not understand the causal connection between beliefs and 
their consequences. 

Suppose that instead of simply averaging the beliefs of their models, naive 
individuals weight models according to their utility-models achieving higher 
utility having a greater influence on a naive individual’s initial belief than in¬ 
dividuals with lower utility. There are many plausible observable correlates of 
utility, such as level of consumption. It seems likely that by imitating individuals 
with higher levels of consumption, naive individuals might increase their chances 
of acquiring beliefs that lead to higher utility. In particular, suppose that the 
initial value of y acquired by a naive individual exposed to models with the 
utilities U\, ..., u ni and beliefs y\, ■ ■ ■ ,y n , is this expression: 


- ELlMI + bud 

y E?=iCl +bui) 


(13) 


where b is a positive constant small enough that terms of order b 2 can be ignored. 

With this assumption, it can be shown (Boyd and Richerson, 1985) the 
mean in the population after transmission is shown here: 


M t + 1 =M; + (1 - 1 / n)B' t E {Reg[y,u(y)]} (14) 

where £{Reg(y,w(y)} is the regression of utility on y averaged over all possible 
sets of models. According to equation (14), the change in the mean due to biased 
transmission depends on two factors: the amount of variability within sets of 
models [(1 — l/n)B' t ] and the extent to which beliefs about the world are pre¬ 
dictably related to utility [E{Reg[y,w(y)]}]. Variability within sets of models is 
important because biased transmission is a culling process that works because 
some models are more attractive than others. If all models are identical, biased 
transmission can have no effect. The regression of utility on y is a measure of the 
average effect of a change in an individual’s beliefs on his or her utility. If it is 
positive, individuals with larger values of y will have higher utility and, therefore, 
be more likely to be imitated. This will cause the mean value of y in the popu¬ 
lation to increase. Both the sign and the magnitude of £{Reg[y,w(y)]} depend on 
the distribution ofy in the population. If M t is less than the optimum value (y), 
larger values of y will on average lead to higher utility, and the regression will be 
positive. The reverse will occur if M t < y. This means that biased transmission 
will leave the mean unchanged only if it is at the optimal value. 

Biased transmission is of interest because it can explain the existence of 
“folk wisdom,” beneficial but poorly understood customs. The preferential 
imitation of successful people will tend to increase beliefs and practices that lead 
to success; there is no need for individuals to understand the causal connection 
between traditional practice and success, even on the part of the individuals who 
invent the practices. 


Natural Selection 

So far we have assumed that the probability that a naive individual is exposed to 
models who are characterized by given beliefs (i.e., a given value ofy) is equal to 
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the frequency of that kind of individual in the previous generation. There is good 
reason to suppose that this assumption is often violated. For example, the 
probability that young professionals are advanced in their firm is likely to depend 
on how much time they devote to work. Underachievers are likely to be fired 
and overachievers to be promoted. Thus, models who are available for imitation 
within a firm may represent a biased sample of the original population. More 
generally, if the behaviors that are shaped by the beliefs acquired by imitation are 
important, they may affect many aspects of individuals’ lives: whom they meet, 
how long they live, how many children they have, or whether they get tenure. 
All of these factors could affect the probability that an individual becomes 
available as a model for others. This means that individuals characterized by 
some values of y will end up being more likely to be imitated than individuals 
with other values. All other things being equal, it is intuitive that this process, 
which we will term “natural selection” because of its close resemblance to the 
biological process, will increase the frequency of the variants most likely to 
“survive” to enter the pool of models. For a more extensive discussion of the 
natural selection of culturally transmitted behaviors, see Boyd and Richerson 
(1985:173-203). 

To formalize this idea, we suppose that the probability that an individual 
who chooses behavior z becomes available as a model, W(z), is the following: 


W(z) = exp{ — (z — w) 2 /2K] 


(15) 


where w is behavior that maximizes the probability of being in the model pool 
and 1 IK is a measure of the intensity of the selection process. Note w need not 
equal y; for example, individuals who devote more than the utility maximizing 
amount of time to their work may be more likely to be promoted within the 
firm. 

Using (15) one can show that the mean value of y in the population of 
models (after selection), M", in this equation: 


M" = 


M[K + wB\ 

B[ + K 


(16) 


Thus, selection moves the mean value of y in the population toward the value 
that maximizes the probability of entering the pool of models, w. One can also 
show that it reduces the variance of y in the population. The strength of both 
these effects is proportional to the variance in y in the population and the 
intensity of the selection process. 

Natural selection is important because it explains how a reliance on tradition 
can lead to erroneous or deleterious beliefs. Many social and economic processes 
affect the kinds of individuals available as models. Some of these processes act on 
the level of the individual, as in the case of the young professional. Others affect 
whole firms or institutions. For example, firms composed of overachievers may 
be more likely to survive and expand than firms composed of utility maximizers. 
When culturally acquired beliefs are important in determining people’s behav¬ 
ior, these selective processes will affect what kinds of people are available for 
imitation and therefore what beliefs will characterize the population. Since there 



RATIONALITY, IMITATION, AND TRADITION 391 


is no reason to believe that such selective processes always favor utility maxi¬ 
mizing behavior, selection may cause the most common beliefs in a population 
to be deleterious. Nonetheless, if information is imperfect and costly to acquire, 
it may still be sensible to rely on tradition; a modest systematic error may be 
preferable to a larger random error. 

As an aside, we could also interpret the case of a naive manager being so¬ 
cialized by overachievers as the acquisition of a new utility function by consid¬ 
ering that preferences are transmitted by tradition and modified by evolutionary 
processes such as selection. Such a model would allow a more general account of 
the relationship between learning and tradition than the Bayesian framework 
used here permits in order to reflect other models of the decision-making process 
(e.g., Nelson and Winter’s, 1982, evolving “routines”). To enlarge on these prob¬ 
lems is, however, outside the scope of this chapter. Here we want to emphasize 
that the standard, and normatively appropriate, Bayesian model is incomplete 
without a theory of tradition. 


Cultural Mechanisms Leading to the ESS Amount of Imitation 

So far we have assumed that natural selection acting on genetic variation or an 
analogous cultural process causes the value of L to change in the direction of in¬ 
creasing expected utility. In this section we consider such cultural processes 
in more detail. Suppose that the relative dependence on tradition versus one’s 
own experience itself is a culturally transmitted trait. Then each of the three 
mechanisms we have just studied can, under the right circumstances, act like 
natural selection to change L in the direction that increases expected utility. 

First, however, it is important to clarify why, within the context of the 
model outlined so far, it is not possible for individuals to choose directly the 
appropriate value of L. An essential assumption of this chapter is that the in¬ 
formation available to individuals is limited; they know the results of their own 
direct experience and the observable behavior of the individuals whom they had 
available to imitate, but they do not know the optimum behavior, y. From equa¬ 
tion (11), the optimal amount of imitation is given by the term B + (M — y) 2 . 
Individuals can estimate B and M from their sample of models, and under some 
circumstances this information might be sensibly used to modify L. They cannot 
choose the optimum value of L, however, because that value depends on how 
close the mean belief in the population is to the optimum, y. 

How do people acquire their attitudes toward tradition? Assume that 
people acquire their value of L by imitation during an earlier episode of social 
learning. With this assumption, any of the processes that change the frequency 
of a culturally transmitted trait could affect the evolution of the mean value of L 
in the population: 

1. Ordinary learning. Individuals might acquire an initial value of L by 
imitation or teaching and then modify it in accordance with their 
experience. For example, during enculturation, individuals must ac¬ 
quire many different beliefs and behaviors. They might experiment 
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with different values of L during early episodes of learning, re¬ 
taining the value that seems to yield the best results. This process 
would change the mean value of L among members of the popu¬ 
lation in the direction that increased average utility. 

2. Biased transmission. Suppose that available models are variable, some 
of them relying on tradition to a greater degree than others. More¬ 
over, suppose that naive individuals can observe some behavior of 
their models that serves as a useful index of the model’s utility. Then 
if naive individuals are predisposed to imitate successful models, the 
mean value of L in the population will move toward the optimum. 
Notice that this can be true even if, as we have assumed, individuals 
have no understanding of why certain beliefs lead to higher utilities. 

3. Natural selection. Once again assume that individuals vary in their 
attitudes toward tradition. Individuals with different values of L 
will, on average, behave differently. If an individual’s behavior 
affects the probability that he or she becomes a model, natural 
selection will change the mean value of L in the direction that 
increases the chance of acquiring behaviors that make an individual 
likely to become a model. To the extent that there is a correlation 
between the utility associated with a behavior and the probability 
that an individual with the same behavior will become a model, 
natural selection would modify L in a utility maximizing direction. 

To see how these processes might work, consider how attitudes toward tradition 
might change as a society undergoes industrialization. It is often thought that in 
pre-industrial agricultural societies people rely heavily on tradition. If one sup¬ 
poses that in such societies information is costly, then their reliance on tradition 
is sensible according to our model. Now, suppose that during industrialization, 
technical and institutional change makes information less costly. According to 
the model, people would be better off if they relied more on their own expe¬ 
rience and less on tradition. This might come about by any of the three processes 
mentioned. To some extent, individuals might have been able to infer from their 
own experience that a lower reliance on tradition improved their lot. More 
plausibly, during industrialization people with a tendency to rely more on their 
own experience and less on traditional beliefs might more readily acquire non- 
traditional skills that lead to wealth and other kinds of observable markers of 
success. If successful individuals are more likely to be imitated, biased trans¬ 
mission would decrease average reliance on tradition. Or less traditional in¬ 
dividuals might simply be more successful at becoming teachers, managers, and 
bureaucrats in modernizing societies. The natural selection mechanism could 
have favored a reduced dependence on tradition through differential achieve¬ 
ment of roles that are important in socialization. 

Invoking processes that affect earlier episodes of imitation to understand the 
nature of a subsequent episode clearly creates a problem of explanatory regress. 
Each of the three processes mentioned depends on some aspect of the imitation 
process, which then must be explained. In the case of ordinary learning, in¬ 
dividuals must have some way of weighting the importance of the value of L that 
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they acquired by imitation against the value that their experience indicates is 
best. Do they rely on their experience or on imitation? In the case of biased 
transmission, individuals must have some criteria of success—do they imitate 
wealthy individuals? Content individuals? Even natural selection will differ in its 
effects depending on whom naive individuals are prone to imitate. Are they 
disproportionately affected by their parents, or are other individuals important? 

Ultimately, these are questions about human nature. The answers must be 
sought in the long-run processes that govern the interactions of cultural and 
genetic evolution in our species. This topic has been discussed at length by us 
(Boyd and Richerson, 1985) and others (Pulliam and Dunford, 1980; Lumsden 
and Wilson, 1981; Durham, 1978). Our work supports two generalizations that 
are relevant here: 

1. If there is genetic variation that affects the tendency of people to 
imitate, natural selection will tend to modify this tendency so that it 
maximizes genetic fitness. Thus, to the extent that people prefer 
fitness-enhancing outcomes, selection would increase average utility. 

2. There are a variety of conditions in which the fitness-maximizing 
values of L are near 1. Thus, it is plausible that even the earliest 
episodes of imitation are not directly subject to genetic influences. 


Discussion 

The economic theory of rational choice under uncertainty is incomplete because 
it is silent about the source of people’s initial beliefs about the world. People are 
not immortal; sometime between birth and adulthood they acquire a set of 
beliefs about the world. Because rational behavior, including the rational re¬ 
sponse to new information, depends on the nature of an individual’s prior beliefs, 
virtually any behavior can be rational, and therefore explicable, given some set of 
prior beliefs. A peasant’s initial resistance to a beneficial innovation is explicable 
if one supposes that he believes that traditional ways are superior to modern 
ones. His ultimate rejection of modern practices may also be rational if his beliefs 
are described by “tight” priors. 

In this chapter we have extended the economic theory of choice under un¬ 
certainty by assuming that individuals acquire their initial subjective probability 
distribution by imitation. In particular, we supposed that each naive individual 
observes the behavior of a number of experienced models sampled from a larger 
population, induces the belief that led to the observed behavior, and then adopts 
an average of those beliefs as his own initial beliefs. Then to understand why 
people acquire the initial beliefs that they do, we must understand why the 
population is characterized by a particular distribution of beliefs. This means that 
models that allow for imitation must account for all of the processes that will arise 
from individual learning and decision making, while others result from social and 
economic processes that have different effects on people with different beliefs. 

This amendment to economic theory is not proposed as a behavioral alter¬ 
native to the usual assumption that people are rational optimizers. Whether they 
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are optimizers or not, mortal individuals must acquire their initial beliefs from 
others. It well may be that the particular model of imitation we have chosen is 
incorrect, that Bayesian optimizing is a poor model of how humans make 
choices, or that genetic inheritance is important in determining people’s be¬ 
havioral predispositions. In any case, we believe that a complete theory of hu¬ 
man behavior would have a similar structure to the models outlined here; it 
would keep track of the dynamics of a population of decision makers by ac¬ 
counting for the processes that change the distribution of beliefs or other pre¬ 
dispositions in the population. Some of these processes will result from people’s 
attempts at improving their lot, while others will result from what happens to 
them because they hold the beliefs that they do. 

There are two lessons that can be drawn from the models presented here: 
first, they suggest that a strong reliance on tradition may indeed be sensible. At 
equilibrium, individuals may rely almost entirely on traditional knowledge and 
ignore any other information that may be available to them. When (1) the 
quality of information available to individuals is low and improving it is costly, 
(2) there is a good chance that the individuals’ models experienced the same 
environment that they experience. Traditional solutions to problems may be 
much closer to the optimal behavior, on the average, than the solutions that 
individuals could devise on their own. 

The theory also suggests, however, that when traditions are substantially 
more important in determining people’s beliefs than their own experience, a 
variety of processes other than individual learning may affect the commonness of 
different beliefs. When tradition is important, it acts like a system of inheritance 
to create heritable variation within and among groups. Processes like biased 
transmission and natural selection can then affect the frequency of different 
beliefs by making it more likely that some beliefs will be transmitted from one 
generation to the next. When the effect of individual experience is small, it is 
plausible that such processes may have an important effect on the way that 
people behave. 

Some of these processes, such as biased transmission, may increase the 
frequency of utility-enhancing behaviors. This fact is of interest because it may 
explain “folk wisdom,” that is, the fact that people hold beneficial traditions that 
they do not understand. The most striking examples of folk wisdom come from 
anthropological research. For example, in many parts of the New World native 
peoples treated maize as a strong base to produce foods such as hominy or masa 
as part of their traditional cuisine. Katz, Hediger, and Valleroy (1974) have 
shown that such treatment makes more of the amino acid lysine available (lysine 
is the least plentiful amino acid in maize). They have also shown that there was a 
strong negative correlation between the use of alkali treatment and the avail¬ 
ability of protein from sources other than maize. Given that many factors in¬ 
fluence nutrition, and that only small, uncontrolled samples were available, it is 
difficult to see how individuals in these cultures could have detected the effect of 
the treatment. Indeed, although Africans have been using maize as a staple for a 
few centuries, alkali cooking has not yet developed there. It seems more likely 
that it could spread because eating treated maize made people more successful 
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or more likely to survive and, therefore, more likely to be imitated. Folk wis¬ 
dom also plays a role in economic thinking. Hayek (1978) argues that tradi¬ 
tional beliefs and institutional arrangements reflect wisdom beyond the ken of 
any individual, and he bases many political and economic prescriptions on this 
view. Similarly, proponents of an evolutionary view of the firm (e.g., Alchian, 
1950; Nelson and Winter, 1982) argue that inherited decision rules that deter¬ 
mine a firm’s response to market conditions may be sensible in ways that nobody 
in the firm understands. 

However, for other processes that affect the frequency of alternative beliefs 
in a population, such as natural selection, there is no guarantee that utility- 
maximizing behaviors will be favored. This may explain the existence of behavior 
that seems paradoxical under the usual assumption of individual rationality. In 
our example of natural selection on behaviors transmitted in the workplace, 
people could come to work harder than they would desire. Such behaviors could 
remain in a population because on average the traditions transmitted within a 
firm are more useful than alternative behaviors individuals could acquire by their 
own efforts. In other words, a reliance on tradition causes individuals to trade 
systematically suboptimal behaviors transmitted within the firm for the ran¬ 
domly suboptimal ones that can be discovered by individual effort. Elsewhere we 
show that processes other than natural selection can have this general effect 
(Boyd and Richerson, 1985). 

Finally, models of the kind described here may also be useful in clarifying the 
relationship between human evolution and contemporary human behavior. 
Hirshleifer (1977) has argued that one of the attractive features of sociobio- 
logical theory is that it provides an independent way to derive utility functions; 
namely, human preferences have been shaped by natural selection so that, at 
least in the context of a hunter-gatherer society, they enhanced genetic fitness. 
While we are sympathetic to this general approach, we have argued (Boyd and 
Richerson, 1985) that many human preferences are difficult to explain on this 
basis. For example, many contemporary professionals seem to sacrifice genetic 
fitness by delaying marriage, reducing family size, and limiting time devoted to 
child care in order to gain professional success. Such behavior is explicable, 
however, if one imagines that individuals who value professional accomplish¬ 
ment for its own sake are more likely to rise to positions of influence than those 
with more "sociobiological” values. To take another example, humans cooper¬ 
ate in large groups of unrelated individuals to provide public goods (such as 
victory in warfare) in a way that seems difficult to reconcile with individual 
fitness maximization. In the work cited, we have shown how some forms of 
cultural transmission, permitting selection on culture at the level of groups, can 
arise from attempt to use traditions to enhance the ends of genetic fitness. To 
take advantage of the economies of information acquisition that tradition offers 
requires a measure of blind trust of traditional wisdom. Such weak rational 
control on tradition by its users may be sensible but at the same time allows 
culture to respond to blind evolutionary processes unique to the cultural system 
of inheritance. These processes may ultimately have important effects on what 
individuals prefer as well as on what they believe. 
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NOTE 

We thank Robert Brandon, John Conlisk, Jack JJirshleifer, Richard Nelson, Eric 
A. Smith, John Staddon, Robert Seyfarth, Joan Silk, Michael Wade, and John Wiley 
for providing comments on an earlier version of this chapter; we also thank John 
Gillespie and Ron Pullman for crucial insights about modeling environmental varia¬ 
tion and learning, respectively. As tradition dictates, we stipulate that any errors are 
our own. 
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1 ^ Simple Models of Complex 
Phenomena 

The Case of Cultural Evolution 


A great deal of the progress in evolutionary biology has resulted 
from the deployment of relatively simple theoretical models. Staddon’s, Smith’s, 
and Maynard Smith’s contributions illustrate this point. Despite their success, 
simple models have been subjected to a steady stream of criticism. The com¬ 
plexity of real social and biological phenomena is compared to the toylike quality 
of the simple models used to analyze them and their users charged with un¬ 
warranted reductionism or plain simplemindedness. 

This critique is intuitively appealing—complex phenomena would seem to 
require complex theories to understand them—but misleading. In this chapter 
we argue that the study of complex, diverse phenomena like organic evolution 
requires complex, multilevel theories but that such theories are best built from 
toolkits made up of a diverse collection of simple models. Because individual 
models in the toolkit are designed to provide insight into only selected aspects 
of the more complex whole, they are necessarily incomplete. Nevertheless, stu¬ 
dents of complex phenomena aim for a reasonably complete theory by studying 
many related simple models. The neo-Darwinian theory of evolution provides 
a good example: fitness-optimizing models, one and multiple locus genetic mod¬ 
els, and quantitative genetic models all emphasize certain details of the evolu¬ 
tionary process at the expense of others. While any given model is simple, the 
theory as a whole is much more comprehensive than any one of them. 

Our argument is not very original; the conscious use of the strategy of using 
simple models to study complex phenomena goes back at least as far as Weber’s 
(1949) use of "ideal types’’ to study human societies. Good modern expositions 
include those by Levins (1966, 1968), Liebenstein (1976), Wimsatt (1980), and 
Quinn and Dunham (1983). If we can contribute anything useful to the case for 
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simple models, it is because our work has involved extending standard evolu¬ 
tionary theory to a particularly troublesome complexity, cultural inheritance of 
humans (and in rudimentary form, of some other organisms). This work makes a 
variety of uses of starkly simple evolutionary models, including models based on 
the assumption of fitness optimization. Yet one of our concerns has been to 
determine the conditions under which fitness optimization models will fail to 
account for human behavior. Perhaps we have acquired a self-conscious aware¬ 
ness of some of the tactical details of the simple-model strategy that will be of 
some use to others. 


The Complexity and Diversity of Evolutionary Processes 

Evolutionary processes are both extremely complex and extremely diverse. On 
this count, those who are skeptical of simple models are certainly on solid 
ground. Every evolving population has a complex history in which many pro¬ 
cesses have contributed to its evolution, including perhaps drift, migration, 
mutation, and many other things besides selection. Further, each of these pro¬ 
cesses can be broken down into a series of interacting subprocesses, each en¬ 
compassing many varieties. Take selection. There is selection on genes with 
large effects, selection on quantitative characters, selection on correlated char¬ 
acters and pleiotropic genes, frequency- and density-dependent selection, se¬ 
lection on sex-limited and sex-linked characters, sexual selection of a couple 
of kinds, and so on. Aside from viruses, all organisms have an intimidatingly 
large number of interacting genes and phenotypic characters. Environments 
vary in space and time with large effects on migration and selection. Age, sex, 
and social organization structure populations and affect their response to evo¬ 
lutionary processes. Developmental processes are complex, although poorly 
understood, and perhaps affect evolution in fundamentally important ways. Or¬ 
ganisms affect their environments as they evolve. In the case of cultural evolu¬ 
tion, additional complexities are introduced. We must understand the details of 
how individuals acquire and modify attitudes and beliefs, how different attitudes 
and beliefs interact with genes and environment to produce behavior, and how 
behavior and environment interact to produce consequences for individual lives. 
Obviously, the study of evolutionary processes must somehow cope with this 
complexity. 

Evolutionary processes are diverse because different populations are quite 
different from one another in terms of their biology and the environments to 
which they are and have been exposed. Discoveries about the concatenation of 
processes affecting the evolution of one population or species do not necessarily 
say very much about those in others. In the case of cultural evolution, the details 
of the cultural transmission process vary appreciably from culture to culture. In 
some, fathers are more important in childhood socialization; in others, less. 
Modern societies depend on formal teachers; in traditional societies members 
of the extended family are often important, and so on. Our models of cultural 
evolution suggest that such structural differences can be quite important to 
understanding what cultural traits might evolve. 



SIMPLE MODELS OF COMPLEX PHENOMENA 399 


Culture and the Evolutionary Process 

In this section, in order to provide a body of detailed examples for use in the later 
sections, we shall sketch some theoretical results from our own work on the com¬ 
plexities in the evolutionary process caused by culture. Other kinds of complexities 
of the evolutionary process could be used instead, but we know this one best. 

In the last few years, a number of scholars have attempted to understand the 
processes of cultural evolution in Darwinian terms. Social scientists (Campbell, 
1965, 1975; Cloak, 1975; Durham, 1976; Ruyle, 1973) have argued that the 
analogy between genetic and cultural transmission is the best basis for a general 
theory of culture. Several biologists have considered how culturally transmitted 
behavior fits into the framework of neo-Darwinism (Pulliam and Dunford, 1980; 
Lumsden and Wilson, 1981; Boyd and Richerson, 1983a,b). Other biologists and 
psychologists have used the formal similarities between genetic and cultural 
transmission to develop theory describing the dynamics of cultural transmission 
(Cavalli-Sforza and Feldman, 1973, 1981; Cloninger, Rice, and Reich, 1979; 
Eaves et ah, 1978). 

The idea that unifies all this work is that social learning or cultural trans¬ 
mission can be modeled as a system of inheritance; to understand the macro¬ 
scopic patterns of cultural change we must understand the microscopic processes 
that increase the frequency of some culturally transmitted variants and reduce 
the frequency of others. Put another way, to understand cultural evolution we 
must account for all of the processes by which cultural variation is transmitted 
and modified. This is the essence of the Darwinian approach to evolution. We 
(Boyd and Richerson, 1985) have been particularly interested in the question of 
the origin of cultural transmission. Under what circumstances might selection on 
genes favor the existence of a second system of inheritance based on the principle 
of the inheritance of acquired variation? 

Cultural and genetic transmission are similar in some respects. For example, 
the skills and dispositions transmitted during enculturation of children by par¬ 
ents create patterns of behavior that are very difficult to distinguish empirically 
from patterns resulting from genetic influences. 

In other respects, cultural and genetic transmission differ sharply. First, 
culture is transmitted by an individual observing the behavior of others or by the 
naive being taught by the experienced. This means that behavior modified by 
trial-and-error learning can subsequently be transmitted; culture is a system for 
the inheritance of acquired variation. Second, patterns of cultural transmission 
are quite different from patterns of genetic transmission. Models other than 
biological parents are often imitated, including peers, grandparents, and so forth. 
The cultural analogues of generation length and the mating system are different 
from, and more variable than, the genetic case. Finally, the naive individual ac¬ 
quiring an item of culture is a more or less active decision-making participant in 
the transmission process. To some extent, we choose what traits we learn from 
others, but a zygote cannot choose its genes. 

The goal of the Darwinian approach to cultural evolution is to understand 
cultural change in terms of the forces that act on cultural variation as individuals 
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acquire cultural traits, use the acquired information to guide behavior, and act as 
models for others. What processes increase or decrease the proportion of people 
in a society who hold particular ideas about how to behave? We thus seek to 
understand the cultural analogues of the forces of natural selection, mutation, 
and drift that drive genetic evolution. These are divisible into three classes: 
random forces, decision-making forces, and natural selection operating directly 
on cultural variation. 

The random forces are the cultural analogues of mutation and drift in genetic 
transmission. Intuitively, it seems likely that random errors, individual idiosyn- 
cracies, and chance transmission play a role in behavior and social learning. For 
example, linguists have documented a good deal of individual variation in speech, 
some of which is probably random individual variation (Labov, 1972). Similarly, 
small populations might well lose rare skills or knowledge by chance, for exam¬ 
ple, due to the premature death of the only individuals who acquired them 
(Diamond, 1978). 

Decision-making forces result when naive individuals evaluate alternative 
behavioral variants and preferentially adopt some variants relative to others. 
Naive individuals may be exposed to a variety of models and preferentially imi¬ 
tate some rather than others. We call this force biased transmission. Alternatively, 
individuals may modify existing behaviors or invent new ones by individual 
learning. If the modified behavior is then transmitted, the resulting force is much 
like the guided, nonrandom variation of classical “Lamarckian” transmission. 

The decision-making forces are derived forces (Campbell, 1965). Decisions 
require rules for making them, and ultimately the rules must derive from the 
action of other forces. These decision-making rules may be acquired during an 
earlier episode of cultural transmission, or they may be genetically transmitted 
traits that control the neurological machinery for acquisition and retention of 
cultural traits. The latter possibility is the basis of the various sociobiological 
hypotheses about cultural evolution (Alexander, 1979; Lumsden and Wilson, 
1981). The authors of these hypotheses, among others, argue that the course of 
cultural evolution is determined by natural selection operating indirectly on 
cultural variation via the decision-making forces. 

Natural selection may also operate directly on cultural variation. Selection is 
an extremely general evolutionary process (Campbell, 1965). Darwin was able 
to formulate a clear statement of natural selection in the absence of a correct 
understanding of genetic inheritance because it is a force that will operate on any 
system of inheritance with a few key properties. There must be heritable vari¬ 
ation, the variants must affect phenotype, and the phenotypic differences must 
affect individuals’ chances of transmitting the variants they carry. That variants 
are transmitted by imitation rather than sexual or asexual reproduction does not 
affect the basic argument, nor does the possibility that some of the variants were 
originally acquired under the guidance of individual decisions. Darwin had no 
problem in imagining that random variation, acquired variation, and natural 
selection all acted together as forces in organic evolution. In the case of cultural 
evolution, we see none either. 

We have attempted to construct a series of models that represent all of 
the processes sketched in the previous section. One interesting general result is 
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that the processes of cultural evolution can easily lead to the evolution of be¬ 
haviors that reduce Darwinian fitness, especially when nonparental individuals 
are important in cultural transmission. In the simplest model we have analyzed 
(Richerson and Boyd, 1984) natural selection acting on cultural variation trans¬ 
mitted by a parent and a “teacher” may cause the trait favoring transmission via 
teachers to go to fixation at a cost in terms of the number of children produced 
by parents. Some Darwinian students of humans (Alexander, 1979; Lumsden 
and Wilson, 1981; Durham, 1976) argue that such effects are unlikely to be im¬ 
portant because a system of cultural inheritance with such properties would 
not be favored by selection on genes. Selection, the argument would run, ought 
to have acted to prevent such distorted cultural adaptations by either (1) the 
creation of decision-making forces that counteract the effect of selection on 
nonparentally transmitted cultural variation or (2) preventing nonparental in¬ 
dividuals from becoming important in cultural transmission. 

We believe this argument is incomplete because it ignores the fact that 
individual decision making may be costly compared to social learning. If the costs 
of using individual decision-making processes are high, selection may not favor 
decision-making forces that would completely compensate for the maladaptive 
effects of nonparental transmission. Similarly, if nonparental patterns of cultural 
transmission offer advantages to individuals of economy in information acqui¬ 
sition, selection on the genes that underlie a capacity for asymmetric transmis¬ 
sion may be favored. 

For example, nonparental individuals may be more useful models than 
parents because they may be more skilled or knowledgeable than parents. The 
effort in decision making required to discriminate exactly among the adaptive 
skills and maladaptive inclinations of teachers and other nonparental models may 
require extensive, costly, empirical checks of each element of the teacher's be¬ 
havior. In contrast, the use of relatively simple, low-cost decision-making rules 
to bias the choice of models or which of their behaviors to imitate may sub¬ 
stantially increase a naive person’s skills at a tolerable cost of imitating some 
maladaptive behaviors. We have analyzed the evolutionary consequences of a 
variety of simple bias rules. These models suggest that nonparental transmission 
may often be adaptive despite the cost of selection, especially in spatially variable 
environments (Boyd and Richerson, 1982, 1985: chs. 7 and 8). In essence, hu¬ 
mans may accept the cost of imitating maladaptive cultural traits because the 
alternatives are a high frequency of random errors or extreme decision-making 
costs. Even when a cultural system of inheritance optimizes genetic fitness when 
averaged over all the traits it transmits, many traits taken individually may be 
quite far from those that would optimize fitness. 

Even more extreme violations of the genetic fitness-optimizing model are 
conceivable. For example, if rules of mate choice are transmitted culturally, 
human genes might be “domesticated” to serve cultural functions. On the other 
hand, perhaps the critics of these models are correct, and the abstract possibilities 
demonstrated by such models are empirically unimportant. The essential point is 
that, like many bits of genetic realism, adding culture to the evolutionary process 
might make a qualitative difference in the behavior we expect to observe com¬ 
pared to that expected from the simple fitness optimizing caricature of evolution. 
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Why Families of Simple Models 
Disadvantages of Complex Models 

In the face of the complexity of evolutionary processes, the appropriate strategy 
may seem obvious: to be useful, models must be realistic; they should incor¬ 
porate all factors that scientists studying the phenomena know to be important. 
This reasoning is certainly plausible, and many scientists, particularly in eco¬ 
nomics (e.g., Hudson and Jorgenson, 1974) and ecology (Watt, 1968), have 
constructed such models, despite their complexity. On this view, simple models 
are primitive, things to be replaced as our sophistication about evolution grows. 

Nevertheless, theorists in such disciplines as evolutionary biology and eco¬ 
nomics stubbornly continue to use simple models even though improvements in 
empirical knowledge, analytical mathematics, and computing now enable them 
to create extremely elaborate models if they care to do so. Theorists of this 
persuasion eschew more detailed models because (1) they are hard to under¬ 
stand, (2) they are difficult to analyze, and (3) they are often no more useful for 
prediction than simple models. Let us now consider each of these points in turn. 

Complex, detailed models are usually extremely difficult to understand. As 
more realism is added, the myriad interactions within the model become almost 
as opaque as the real world we wish to understand. When a set of not-so- 
complex parts is linked into an interacting complex, it is often impossible to 
understand why the results behave as they do. To substitute an ill-understood 
model of the world for the ill-understood world is not progress. In the end, the 
only way to understand how such a model works is to abstract pieces from it or 
study simplified cases where its behavior is more transparent. Even when 
complex models are useful, they are so because we understand how they work in 
terms of simple models abstracted from them. 

Costly, complex models are most likely to be scientifically justified when 
phenomena are complex but not diverse. It is worth studying the complexities of 
atoms in great detail because there are only a few kinds, and they all obey the 
same basic laws. The generality of such laws makes them worth knowing even if 
the task is difficult. The equivalent sophistication in a model of the evolution of a 
given society or species is possible, perhaps, but unlikely to be justified on sci¬ 
entific grounds because of limited generalizability to other species or societies. 

The analysis of complex models is also expensive and time consuming. The 
complexity of a recursion model is roughly measured by the number of inde¬ 
pendent variables that must be kept track of from generation to generation. It 
usually is not possible to analyze nonlinear recursions involving more than a 
handful of variables without resorting to numerical techniques. Until the advent 
of digital computers, obtaining numerical solutions was impractical. Since then, 
however, there have been many attempts to make computer simulation models 
of complex social and biological processes. These projects have generally been 
quite costly. As the number of variables in a model increases, the number of 
interactions between variables increases even faster. This means that even with 
the fastest computers, it is not practical to explore the sensitivity of a model to 
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changes in assumptions about very many of its constituent interactions. Con¬ 
siderations of economy of effort in scientific practice dictate that we should be 
satisfied with much simpler models than we could build in principle. 

Complex, realistic models are sometimes employed when prediction rather 
than understanding is the main goal. Numerical weather prediction models and 
economic forecasting models come to mind. In both cases the gains in under¬ 
standing of atmospheric and economic phenomena are mostly attributable to the 
constituent simple submodels of particular processes that are individually not 
much good for prediction. The marginal increase in understanding relative to 
cost in the large predictive models is so small that only their practical application 
justifies their expense; scientific discovery would be better served by more 
attention to the simpler models. As Dupre (1987) observes, explanation differs 
from prediction in being easier to achieve (leaving aside statistical models that 
make no pretentions to explanation). We would argue in addition that expla¬ 
nation or understanding is scientifically far more fundamental than prediction. 
This is most clearly evident in examples such as the simple deterministic models 
of economic and population processes that can exhibit chaotic behavior (Day, 
1982; May, 1976). If these models prove to apply in the real world, they will 
guarantee that only short-range predictions are possible with less than perfect 
specification of initial conditions, but they also give a quite satisfactory expla¬ 
nation of why this is so. The problem is well understood in the context of a 
purely physical problem, weather prediction (Smagorinsky, 1969). 

Detailed models of complex social or biological systems are often not much 
more useful for prediction than are simple models. Detailed models usually re¬ 
quire very large amounts of data to determine the various parameter values in the 
model. Such data are rarely available. Moreover, small inaccuracies or errors 
in the formulation of the model can produce quite erroneous predictions. The 
temptation is to “tune” the model, making small changes, perhaps well within 
the error of available data, so that the model produces reasonable answers. 
When this is done, any predictive power that the model might have is due more 
to statistical fitting than to the fact that it accurately represents actual causal 
processes. It is easy to make large sacrifices of understanding for small gains in 
predictive power. Contrarily, although evolutionary processes are inherently 
complex and diverse, models with a few variables may capture enough of the 
really important processes in a given case or class of cases both to explain and to 
predict with tolerable accuracy. 

The Utility of Simple Models 

In the face of these difficulties, the most useful strategy will usually be to build a 
variety of simple models that can be completely understood but that still capture 
the important properties of the processes of interest. Liebenstein (1976: ch. 2) 
calls such simple models "sample theories.” Students of complex and diverse 
subject matters develop a large body of models from which “samples” can be 
drawn for the purpose at hand. Useful sample theories result from attempts to 
satisfy two competing desiderata: they should be simple enough to be clearly and 
completely grasped, and at the same time they should reflect how real processes 
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actually do work, at least to some approximation. A systematically constructed 
population of sample theories and combinations of them constitutes the theory 
of how the whole complex process works. 

The synthetic theory of evolution provides a good example. Each of the 
basic processes (e.g., selection, mutation, drift) is represented by a large variety 
of simple models, some specific to a particular population, and others quite 
general. These models are combined in different ways to represent interesting 
phenomena (e.g., sexual selection, speciation). This whole family of models, 
together with a knowledge of which models are appropriate for what kinds of 
situations, constitutes the theoretical system of population biology. 

A theoretical system so constituted from simple sample models is a com¬ 
plicated and diverse collection of knowledge; it cannot be legitimately labeled 
simpleminded. Still, every tactical deployment of models to study a question of 
interest will be quite simple compared to the phenomena that they are intended 
to represent. The sample models are caricatures. If they are well designed, they 
are like good caricatures, capturing a few essential features of the problem in a 
recognizable but stylized manner and with no attempt to represent features not 
of immediate interest. 

Wimsatt (1980, 1981) provides good general discussions of tactical con¬ 
siderations in the deployment of simple models. To Wimsatt, all sample models 
of evolutionary phenomena should be viewed as “heuristics” rather than uni¬ 
versally applicable laws. This terminology has the virtue of emphasizing that all 
sample models have defects. They usefully apply only over a limited range of 
phenomena, and even over the range where they are useful they are almost 
certain to have biases. Even the very best scientific heuristic (or sample model) 
will fail and possibly mislead if pushed too far or in the wrong direction. It is in 
attention to details of the use of simple sample theories that these problems are 
minimized and the maximum understanding gained. The user attempts to dis¬ 
cover “robust” results, conclusions that are at least qualitatively correct, at least 
for some range of situations, despite the complexity and diversity of the phe¬ 
nomena they attempt to describe. 

Note that simple models can often be tested for their scientific content via 
their predictions even when the situation is too complicated to make practical 
predictions. Experimental or statistical controls often make it possible to expose 
the variation due to the processes modeled, against the background of “noise” 
due to other ones, thus allowing a ceteris paribus prediction for purposes of 
empirical testing. Simple models, in other words, are the formal theoretical 
parallel of the experimental and comparative methods so widely used in biology 
and the social sciences. 

Generalized Sample Theories 

Generalized sample theories are an important subset of the simple sample 
theories used to understand complex, diverse problems. They are designed to 
capture the qualitative properties of the whole class of processes that they are 
used to represent, while more specialized ones are used for closer approxima¬ 
tions to narrower classes of cases. Generalized sample theories are useful because 



SIMPLE MODELS OF COMPLEX PHENOMENA 405 


we do not seem to be able to construct models of social and biological phe¬ 
nomena that are general, realistic, and precisely predictive (Levins, 1966, 1968). 
That is, evolutionary biologists and social scientists have not been able to satisfy 
the epistemological norm derived from the physical sciences that holds that the¬ 
ory be in the form of universal laws that can be tested by the detailed predictions 
they make about the phenomena considered by the law. This failure is probably 
a consequence of the complexity and diversity of living things. Basic theoretical 
constructs like natural selection are not universal laws like gravitation; rather, 
they are taxonomic entities, general classes of similar processes that nonetheless 
have a good deal of diversity within the class. A theoretical construct designed to 
represent the general properties of the class of processes labeled natural selection 
must sacrifice many of the details of particular examples of selection. On the 
other hand, a model tailored to the details of a particular case is unlikely to 
have much relevance beyond that case. Further, the most precise predictions 
may be obtained by statistical models that sacrifice realism and hence are useless 
as explanatory devices. 

One might agree with the case for a diverse toolkit of simple models but still 
doubt the utility of generalized sample theories. Fitness-maximizing calculations 
are often used as a simple caricature of how selection ought to work most of 
the time in most organisms to produce adaptations. Does such a generalized 
sample theory have any serious scientific purpose? Some might argue that their 
qualitative kind of understanding is, at best, useful for giving nonspecialists a 
simplified overview of complicated topics and that real scientific progress still 
occurs entirely in the construction of specialized sample theories that actually 
predict. A sterner critic might characterize the attempt to construct generalized 
models as loose speculation that actually inhibits the real work of discovering 
predictable relationships in particular systems. 

These kinds of objections implicitly assume that it is possible to do science 
without any kind of general model. All scientists have mental models of the world. 
The part of the model that deals with their disciplinary specialty is more detailed 
than the parts that represent related areas of science. Many aspects of a scientist’s 
mental model are likely to be vague and never expressed. The real choice is be¬ 
tween an intuitive, perhaps covert, general theory and an explicit, often mathe¬ 
matical, one. 

It seems to us that generalized sample models such as fitness-optimizing 
models do play an important role. Well chosen to represent the stripped-down 
essence of a much larger set of more specialized models, generalized sam¬ 
ple theories serve important functions in scientists’ cognitive organization of 
complex-diverse subject matters and in communication between specialists. For 
example, we are concerned with the details of how cultural transmission occurs, 
a subject studied by psychologists (Boyd and Richerson, 1985: ch. 3). Social 
learning theorists have made many, but not all, of the kinds of measurements 
that are necessary for specifying good sample theories of cultural transmission. 
Crucial unknowns include the mechanisms by which variation and covariation 
are maintained in cultural traits. These properties have important implications 
for the process of cultural evolution because the selection and bias forces depend 
on the maintenance of variation for their effectiveness. These deficiencies of 



406 links to other disciplines 


social learning theory are not at all apparent in the absence of a theory linking the 
psychology of enculturation with the macroscopic phenomena of social in¬ 
stitutions and long-run outcomes. It seems unlikely that a sensible psychologist 
would be motivated to make the arduous and costly experiments necessary to 
determine such processes without a general theoretical argument justifying their 
importance. This is an example of a common situation: constructing models that 
make such links, even if they are simple caricatures, often shows that processes 
with small, relatively hard to measure, effects can produce major results. 

The relationship between a generalized sample theory and empirical test or 
prediction is a subtle one. To insist upon empirical science in the style of physics is 
to insist upon the impossible. However, to give up on empirical tests and pre¬ 
diction would be to abandon science and retreat to speculative philosophy. 
Generalized sample theories normally make only limited qualitative predictions. 
The logistic model of population growth is a good elementary example. At best, 
it is an accurate model only of microbial growth in the laboratory. However, it 
captures something of the biology of population growth in more complex cases. 
Moreover, its simplicity makes it a handy general model to incorporate into 
models that must also represent other processes such as selection, and intra- and 
interspecific competition. If some sample theory is consistently at variance with 
the data, then it must be modified. The accumulation of these kinds of mod¬ 
ifications can eventually alter general theory, either by compelling the aban¬ 
donment of some sample models or by systematizing knowledge about the 
variation of processes. In extreme cases, major discoveries in some of the com¬ 
ponents of a general theory can compel the reorganization of the entire edifice, as 
exemplified by the impact of Mendelian genetics on Darwinian theory in biology. 
No one nowadays would think of using Karl Pearson’s models of the inheritance 
of acquired variation as a sample theory of genetic inheritance, although they 
might have some specialized uses in the study of cultural evolution. 

A generalized model is useful so long as its predictions are qualitatively 
correct, roughly conforming to the majority of cases. It is helpful if the inevitable 
limits of the model are understood. It is not necessarily an embarrassment if more 
than one alternative formulation of a general theory, built from different sam¬ 
ple models, is more or less equally correct. In this case, the comparison of theories 
that are empirically equivalent makes clearer what is at stake in scientific con¬ 
troversies and may suggest empirical and theoretical steps toward a resolution. 


Some Remarks on the Strategy of Building Simple Models 

One of the main points of the preceding discussion is that the analysis of evo¬ 
lutionary problems using simple models depends very much on the appropriate 
choice of those models. How does one go about making such choices? Evolu¬ 
tionary biologists and social scientists use a variety of methods to accomplish this 
task that, we believe, can be collected under three main headings, correspond¬ 
ing to idealized analytical steps: (1) the choice of problem, (2) the modular¬ 
ization of analysis, and [3] the construction of synthetic hypotheses that we shall 
call “plausibility arguments.” 
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Choice of Problem 

When one uses simple models to understand complex and diverse problems, the 
choice of the problem to be analyzed exerts a strong influence on the kinds of 
simplifications one chooses. The idea is to simplify most drastically those aspects 
that are not centrally related to the problem at hand in order to retain the 
maximum feasible detail in the features of most direct interest. In the case of our 
models of cultural evolution, we have been concerned with the evolution of cul¬ 
tural organisms from acultural ancestors. This required us to represent the pro¬ 
cesses of ordinary organic evolution in most of our modeling efforts. Still, we were 
also interested in trying to develop preliminary general models of the important 
structural features and forces that affect cultural evolution. Given this choice of 
problem, it seemed advisable to use very simple models of genetic processes to 
represent the evolution of genetic capacities for culture in order that the models 
of cultural transmission could be made a bit more elaborate. Thus, we frequently 
asked what parameter value of a model controlling the propensity to acquire 
culture in a certain way would cause fitness to be optimized. Those models that 
included specific genetics used only the simplest haploid, one locus, or quanti¬ 
tative models of genetic transmission. 

Models emphasizing cultural detail at the expense of genetic detail accept 
the risk that some particular complexity of the human genetic system plays a 
direct role in the coevolution of genes and culture. For example, if genes af¬ 
fecting the behavior toward relatives are transmitted on the Y chromosome, 
as Hartung (1976) suggested, the models we constructed might turn out to be 
seriously misleading. The opposite risk, however, seemed more serious to us in 
the context of the problem; in models that are too complex, the important de¬ 
tails of culture itself might be obscured or lost. Several commentators (Maynard 
Smith and Warren, 1982; Boyd and Richerson, 1983b; Kitcher, 1985) have re¬ 
marked that the analysis that led Lumsden and Wilson (1981) to their “thousand 
year rule’’ is dubious because key properties of culture disappear as a result of 
simplifying assumptions. The general formulation of their model is conceptually 
satisfactory, but its complexity appears to have dictated misleading simplifica¬ 
tions in the interests of successful analysis. 

Modularization of Analysis 

Most interesting evolutionary problems involve the interaction of evolutionary 
processes and a particular pattern of genetic transmission and gene expression. 
For example, the interaction of selection and mutation at a diploid locus is a 
classic problem of the synthetic theory. The sample models of the parts of this 
problem are less interesting than the combination of them in a model that can 
help us understand how the two basic forces interact with genetically inherited 
variation. Similar problems are of interest in cultural evolution. How does 
learning, acting as an evolutionary force because learned variants can be imitated, 
interact with selection, both selection on the cultural variants and on the un¬ 
derlying senses of reward and punishment that guide learning? Such combina¬ 
tions of processes inevitably make for relatively complex models. To make any 
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headway, relatively difficult mathematical and experimental procedures have to 
be introduced, and many simplifying assumptions have to be made. Difficult 
choices between analytical tractability, comprehensibility, generality, and real¬ 
ism have to be made. Is a fitness optimization representation of the genetic 
process a reasonable simplification, or can some additional genetic realism be 
usefully retained in the context of the problem? 

The answers to such questions are sought by breaking the problem down 
first into its constituent sample models and then reassembling them step by step 
into more complex combinations. This tactic is obvious but easily misunderstood 
and misused. In the long run, the simple models strategy leads to large families 
of well-understood sample models, some of which will be relatively complex, 
specialized, and difficult to understand. Also, relatively complex combinations of 
models are often useful. However, such relatively complicated models depend 
on a thorough understanding of the simplest models of each family and of the 
constituent submodels of compound models. The possibility for artifactual re¬ 
sults increases with the complexity of the analysis unless one can be reasonably 
confident that the constituent sample models are empirically reasonable and 
mathematically well behaved. It is relatively much easier to conduct experiments 
and detailed mathematical analysis on processes when they are isolated than 
when they are imbedded in a complex system. In population biology, both 
history and pedagogic practice suggest that one must begin with an under¬ 
standing of the elementary constituents of the theory. 

While building models of complex processes composed of simpler modules 
may be second nature to evolutionary biologists, in our experience it sometimes 
confuses social scientists who read the present body of theory in cultural evolu¬ 
tion. The modularization of complex problems seems reductionistic; even after 
the parts are reassembled it seems to some readers as if the models are attempt¬ 
ing to deduce the properties of wholes from properties of parts. The tactical 
“reductionism” used to understand a problem does not imply that the interaction 
of parts might not produce irreducible effects. For example, some models of 
culture built using this tactic suggest that group selection might be especially 
likely under some plausible forms of cultural transmission (Boyd and Richerson, 
1985: ch. 7}. 

Sometimes, evolutionary biologists (and social scientists who use similar 
methods, such as economists) contribute to the confusion by failing to distin¬ 
guish between the heuristic use of tactical reductionism from a real belief that 
some particular simple model is a true description of a complex process. Indeed, 
the relative ease with which interesting, even approximately correct, results can 
be obtained for intrinsically rather complex processes with simple models can 
lead the unwary to conclude that successful tactical reduction implies the ade¬ 
quacy of a philosophical reductionist stance. Those who are so tempted should 
consult Wimsatt’s work. Most users of simple models know better. For example, 
Dawkins (1982), a prototypical genetic reductionist by some accounts (Sober, 
1984), begins his discussion (pp. 1-2) by asking the reader to take his idea of 
selfish genes with extended phenotypes as a heuristic model. Later (by p. 7), 
Dawkins does express the hope that it may prove more fundamental than a mere 
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heuristic, but the distinction between the two interpretations is clear, and the 
reader is left the choice. 

The development of a formal theory of cultural evolution is in its infancy, 
and attention has properly concentrated on quite elementary models. This means 
that the theory to date appears quite reductionistic. For example, most models 
consider only one cultural trait. On the one hand, an overenthusiast might claim 
that these models are relatively successful in explaining human behavior and 
hence that human cultures really can be atomized into traits. On the other hand, 
a critic might complain that they are completely bankrupt because they do not 
take account of the fact that cultural traits must interact in complex ways. The 
fact is that such preliminary models are silent about what complexities might 
flow from the interaction of multiple traits. That is a difficult question in its own 
right, but one whose analysis must be deferred until we understand the simpler 
theoretical elements we might use in such an analysis. 

The thorough study of simple models includes pressing them to their ex¬ 
treme limits. This is especially useful at the second step of development, where 
simple models of basic processes are combined into a candidate generalized 
model of an interesting question. There are two related purposes in this exercise. 

First, it is helpful to have all the implications of a given simple model 
exposed for comparative purposes, if nothing else. A well-understood simple 
sample theory serves as a useful point of comparison for the results of more 
complex alternatives, even when some conclusions are utterly ridiculous. 

Second, models do not usually just fail; they fail for particular reasons that 
are often very informative. Just what kinds of modifications are required to make 
the initially ridiculous results more nearly reasonable? For example, the failures 
of the logistic model of population growth suggest the amendments needed to 
make better models. In the case of culture, models that include only faithful 
cultural transmission suggest that culture is generally inferior to genes as a mode 
of inheritance (Cavalli-Sforza and Feldman, 1983). If the evolution of culture in 
the hominid line was favored by natural selection, there must be more to the 
story than just the acquisition of behavior by imitation. We have suggested that 
the ability of culture to couple individual learning to a transmission mechanism, 
thus to generate a system for the inheritance of acquired variation, could cause 
capacities for culture to evolve (Boyd and Richerson, 1983a, 1985: ch. 4). How¬ 
ever, this analysis also fails because it suggests that the advantages of culture are 
quite general, and hence that many organisms ought to have “Lamarckian” 
systems of inheritance. This failure in turn suggests that there are other costs to 
the inheritance of acquired variation that must be accounted for. 

In both of these respects, human sociobiology has made a major contribu¬ 
tion by showing what must be true if the genetic fitness optimizing model 
generally holds when behavioral variation is proximally transmitted by culture. 
For example, Alexander (1979; see also Flinn and Alexander, 1982) argues that 
decision-making forces are powerful enough to constrain cultural variation to 
maximize fitness in most circumstances. Important qualitative predictions flow 
from this argument. If strong, accurate decision making is possible, then humans 
need not depend on relatively passive imitation; they can easily invent or choose 
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those behaviors appropriate to the environments they find themselves in. If so, 
culture will behave more like ordinary mechanisms of phenotypic flexibility than 
like an inheritance system. Empirically, behavioral variation will be largely ex¬ 
plicable, even in the short run, in terms of environmental variation rather than 
the variation in what traits are available for imitation. This argument also implies 
that costs of making decisions are low relative to any economies that might result 
from imitation. In our judgment (Boyd and Richerson, 1985: ch. 5), theory and 
the available data suggest that Alexander’s argument is incorrect in general, 
although it may well be roughly correct for those traits for which accurate 
decision making is easy. Regardless of whether we or Alexander ultimately prove 
more nearly correct, his contribution is substantial; work on the complexities of 
culture is much aided by having the implications of the simplest genetic fitness- 
maximizing model incorporating culture cogently developed. 

The exhaustive analysis of many sample models in various combinations is 
also the main means of seeking robust results (Wimsatt, 1981). One way to gain 
confidence in simple models is to build several models embodying different 
characterizations of the problem of interest and different simplifying assump¬ 
tions. If the results of a model are robust, the same qualitative results ought to 
obtain for a whole family of related models in which the supposedly extraneous 
details differ. The fact that genetic and game theoretic models of altruism usu¬ 
ally lead to similar conclusions reassures us that general results like Hamilton’s 
k= 1/rrule are robust. Similarly, as more complex considerations are introduced 
into the family of models, simple model results can be considered robust only 
if it seems that the qualitative conclusion holds for some reasonable range of 
plausible conditions. Thus, quantitative genetic (Boyd and Richerson, 1982) and 
multiple-locus models (Uyenoyama and Feldman, 1980) suggest that Hamilton’s 
rule is approximately correct when a variety of complications is introduced. 
Complications substantially affect the exact form of the rule, but do preserve the 
qualitative result that kin cooperation can evolve and the propensity to coop¬ 
erate should be a function of relatedness under most circumstances that seem 
empirically reasonable. Nevertheless, it is slow and difficult work to make rea¬ 
sonably certain that particular results can be treated as robust (Wimsatt, 1980). 

In the case of cultural evolution, we make the tentative claim that the costly 
information argument is a robust result. In all of the models we have constructed 
of the novel structural properties of culture and the evolutionary forces that 
result from them, it seems that optimizing the genetic fitness of a capacity for 
culture generally leads to a situation in which many individual cultural traits can 
easily evolve to values quite distant from those that would maximize fitness, so 
long as decision making is costly. These results do not depend on whether cul¬ 
tural traits are imagined to be discrete characters or continuous quantitative 
variables, for example. The tentativeness of the claim must be emphasized be¬ 
cause the whole corpus of models of cultural evolution is still so small. 

Plausibility Arguments 

We believe that “plausibility argument’’ is a useful term for a scientific construct 
that plays much the same role in the study of complex, diverse phenomena that 
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mutually exclusive hypotheses are supposed to play in the investigation of 
simpler subject matters. A plausibility argument is a hypothetical explanation 
having three features in common with a traditional hypothesis: (1) a claim of 
deductive soundness, of in-principle logical sufficiency to explain a body of data; 
(2) sufficient support from the existing body of empirical data to suggest that 
it might actually be able to explain a body of data as well as or better than com¬ 
peting plausibility arguments; and (3) a program of research that might distin¬ 
guish between the claims of competing plausibility arguments. The differences 
are that competing plausibility arguments (1) are seldom mutually exclusive, (2) 
can seldom be rejected by a single sharp experimental test (or small set of them), 
and (3) often end up being revised, limited in their generality or domain of 
applicability, or combined with competing arguments rather than being rejected. 
In other words, competing plausibility arguments are based on the claims that 
a different set of submodels is needed to achieve a given degree of realism and 
generality, that different parameter values of common submodels are required, 
or that a given model is correct as far as it goes, but applies with less generality, 
realism, or predictive power than its proponents claim. Most frequently, the 
empirical program suggested by competing plausibility arguments is an arduous 
series of measurements of the relative strengths of several known processes in a 
wide range of organisms. 

The reason for these differences is that quantitative questions are at the crux 
of debates about evolutionary processes. For example: how strong is selection 
among individuals relative to selection among groups? Theoretical analysis 
suggests that selection among groups must be commonplace, and laboratory 
experiments (Wade, 1977) demonstrate that it could have important effects. 
However, it is not at all clear whether selection among groups is important 
in nature. Sex ratio provides another example. Clear examples of sex ratio dis¬ 
tortion exist (Hamilton, 1967), and theory suggests that it should be favored 
under a wide variety of ecological conditions (Charnov, 1982). Yet this process 
seems to be relatively rare—at least weak enough to neglect in most cases. Even 
if we are willing to be content with qualitative knowledge of complex processes, 
the term “qualitative” must be taken in the sense of rough estimates of quan¬ 
titative variables, not in the sense of simple acceptance or rejection of mutually 
exclusive hypotheses. This feature of evolutionary problems is the basis for 
Quinn and Dunham’s (1983) rejection of Popperian falsification as a proper 
epistemological model in ecology and evolution (see also Rapoport’s, 1967, 
claim that many scientific paradoxes have been resolved when the polar positions 
were shown to be only opposite ends of a continuum). 

Human sociobiology provides a good example of a plausibility argument. 
The basic premise of human sociobiology is that fitness-optimizing models 
drawn from evolutionary biology can be used to understand human behavior. 
Many social scientists have objected to this enterprise on the grounds that 
evolutionary theory does not account for the existence of culture. As we have 
already noted, Alexander (1979), Lumsden and Wilson (1981), Durham (1976), 
and others have defended the fitness-optimizing approach not by denying the 
importance of culture but by proposing various means by which decision-making 
forces could evolve under the guidance of selection to constrain cultural evolution 
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so as generally to produce fitness-optimizing behavior. These authors have 
supported their plausibility argument by constructing an array of simple models 
that predict the details of human behavior in various circumstances—for ex¬ 
ample, patterns of adoption, unilineal descent, and child abuse—and compared 
the results of these simple models with empirical data. 

The sociobiological explanations of human behavior and those derived from 
explicit models of cultural evolution provide an example of competing plausi¬ 
bility arguments. As Flinn and Alexander (1982) argue, there is wide agreement 
among Darwinian students of the problem of human evolution that culture is 
important and that the processes of cultural evolution may sometimes fail to 
keep cultural variation “on track” of genetic fitness (e.g., Alexander, 1979:142). 
Disagreements revolve around the relative strength of decision-making forces 
compared to natural selection on cultural variation, the degree to which cultural 
transmission acts like an inheritance system rather than an ordinary mechanism 
for phenotypic flexibility, the importance of nonparental transmission, and so 
forth. For example, we have argued that decision making is frequently costly and 
that this allows culture a certain autonomy, while Durham (1976) argues that 
cultural evolution will be constrained to produce behaviors that approximately 
maximize fitness most of the time. 

We think that the clearest way to address the controversial questions raised 
by competing plausibility arguments is to try to formulate models with para¬ 
meters such that for some values of the critical parameters the results approxi¬ 
mate one of the polar positions in such debates, while for others the model 
approximates the other position. If the parameters that produce these contrasting 
results capture some real features of the processes of cultural and genetic coevo¬ 
lution, it may be possible to understand at least what is at stake in the controversy. 
In the models we have constructed, several parameters control the extent to 
which a typical cultural trait will be at the fitness optimum. If decisions about 
what cultural behaviors to adopt or invent can be made easily and accurately, and 
the rules that guide choices are ultimately transmitted genetically and subject to 
selection, culture will be very strongly constrained to maximize genetic fitness. 
Similarly, if important cultural traits are transmitted mostly from biological 
parents to offspring, cultural variation will act much like an extra chromosome of 
a biochemically odd kind. Even if decision-making forces are weak, selection on 
cultural variation will favor individual (inclusive) reproductive success, subject 
only to the same kinds of qualifications that obtain for a genetic locus. This result 
seems to approximate Durham’s (1976) argument. As decision-making costs and 
nonparental transmission are allowed to become more important, cultural evo¬ 
lution becomes less directly constrained by selection on genes that control culture 
and it is possible to approximate positions like the group-functionalism of many 
social scientists and the afunctional position of Sahlins (1976). 

As primitive as our own models are in this regard (see also Pulliam and 
Dunford, 1980; Werren and Pulliam, 1981; Pulliam, 1982, 1983), we think they 
are a promising step. The costs of decision making and the extent to which 
important items of culture are transmitted by nonparental individuals are em¬ 
pirical issues that can be resolved. Indeed, data already exist on these points 
(Boyd and Richerson, 1985: chs. 3 and 5). It would be overenthusiastic to claim 
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that any of the controversial questions surrounding the application of Darwinism 
to human culture are resolved, but we do believe that the modest body of for¬ 
mal theory so far developed, and empirical argument derived from the theory, 
has clarified the issues to the extent that rapid progress is now possible. 

A well-developed plausibility argument differs sharply from another com¬ 
mon type of argument that we call a programmatic claim. Most generally, a 
programmatic claim advocates a plan of research for addressing some out¬ 
standing problem without, however, attempting to construct a full plausibility 
argument. Programmatic claims can be exceedingly useful; the development of 
a Darwinian theory of culture was greatly stimulated by mostly programmatic 
essays such as those by Campbell (1965), Ruyle (1973), and Cloak (1975). 
However, they are useful only insofar as they indicate the possibility of, or need 
for, new plausibility arguments. An attack on an existing, often widely accepted, 
plausibility argument on the grounds that the plausibility argument is incom¬ 
plete is a kind of programmatic claim. Critiques of human sociobiology are com¬ 
monly of this type. Burden-of-proof claims are another variant. For example, 
sociobiologists often seem to imply that the general success of adaptive reasoning 
in biology means that the existence of any prima facie plausible adaptive in¬ 
terpretation of human behavior is a sufficient counter to anything but a perfect 
case for a nonadaptive explanation. 

Programmatic attacks and burden-of-proof claims can be positively harmful 
when taken, by themselves, as sufficient substitutes for a sound plausibility ar¬ 
gument. We have argued that theory about complex-diverse phenomena is 
necessarily made up of simple models that omit many details of the phenomena 
under study. It is very easy to criticize theory of this kind on the grounds that it is 
incomplete (or defend it on the grounds that it one day will be much more 
complete). Such criticism and defense is not really very useful because all such 
models are incomplete in many ways and may be flawed because of it. What is 
required is a plausibility argument that shows that some factor that is omitted 
could be sufficiently important to require inclusion in the theory of the phe¬ 
nomenon under consideration, or a plausible case that it really can be neglected 
for most purposes. Thus, for example, it is not enough to attack a purportedly 
general plausibility argument with a few special cases, for it is (or ought to be) 
stipulated that generalized models are always likely to account more or less poorly 
for many special cases. In contrast, the success of genetic fitness-maximizing 
theory in biology cannot be used to defend that generalized model in the face of 
plausible arguments that cultural evolution is a divergent special case. 

It seems to us that until very recently, “nature-nurture” debates have been 
badly confused because plausibility arguments have often been taken to have 
been successfully countered by programmatic claims. It has proved relatively easy 
to construct reasonable and increasingly sophisticated Darwinian plausibility 
arguments about human behavior from the prevailing general theory. It is also 
relatively easy to spot the programmatic flaws in such arguments; conventional 
Darwinian models do not allow for human culture. The problem is that pro¬ 
grammatic objections have not been taken to imply a promise to deliver a full 
plausibility claim. Rather, they have been taken as a kind of declaration of in¬ 
dependence of the social sciences from biology. Having shown that the biological 
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theory is in principle incomplete, the conclusion is drawn that it can safely be 
ignored. Sahlins’s (1976) objections to human sociobiology seem to us to have 
been as much in this tradition as Tarde’s (1903:xxi-xxii) very early one. Both 
arguments ignore that Darwinian plausibility arguments ordinarily contain a se¬ 
rious rationale for accepting their claims despite the unique aspects of the human 
species. Certainly this is the case with contemporary human sociobiology and 
explains why it has attracted support by social scientists like van den Berghe 
(1979, 1981), who cannot be accused of simpleminded hereditarianism. 


The Importance of Scientific Pluralism 

Jared Diamond (personal communication) has drawn the following useful lesson 
from his experience as both a physiologist and a community ecologist: in phys¬ 
iology, controversial issues are ordinarily settled quickly by definitive experi¬ 
ments. As a result, debate over contending hypotheses is quite restrained and 
polite. One or the other contending claim is almost certain to turn out wrong in 
short order, and any grandiose pronouncements, ad hominem attacks, or similar 
departures from polite scientific discourse can be held against the loser. As long 
as scientists know that they can easily be proven wrong by a few critical exper¬ 
iments in the next few years, they will refrain from such departures. In ecology, 
major controversies last much longer because the issues are more complex and 
testing contending plausibility arguments is a long-drawn-out affair. The result is 
that individual claimants are often unlikely to be proven cleanly right or wrong, 
at least during their own lifetimes. Rhetorical excesses thus cannot be clearly 
proven as such by the failure of the programmatic claim or plausibility argument 
to which they are attached, and consequently the motivation to avoid them is 
reduced. 

Perhaps differences between these two disciplines can be understood in 
terms of Campbell’s (1979) general discussion of scientific honesty. According to 
Campbell, scientists are more honest in their occupational behavior than other 
professionals, but not because they are morally superior as individuals. Rather, 
they are careful to present honest work because other scientists are very discrim¬ 
inating consumers. Scientists frequently replicate crucial experiments and can 
gain prestige by detecting errors. In a controversy, many members of the com¬ 
munity will act as relatively unbiased judges of the acceptability of contending 
hypotheses because their own work depends on using the correct result—say, to 
make a more accurate measurement instrument. Such acceptors have an interest 
in the resolution of the controversy but not a vested interest in any particular 
outcome. It seems likely that this mechanism will work much more effectively 
when controversial issues are resolved quickly, and consumer/acceptors can 
confidently use secure results in their own work. In the case of evolutionary and 
ecological problems, ambiguity lasts longer, and consumers may be forced to 
choose among plausibility arguments, thus coming to have a vested interest in 
the controversy. The extensive empirical program of the complex-diverse dis¬ 
ciplines reduces the incentive to replicate individual experiments directly be¬ 
cause they make so small a contribution to the total program. 
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Campbell (1969, 1986) contributed an insightful analysis of another po¬ 
tentially serious problem in the study of complex-diverse subject matters: the 
social complexity of the sciences that study them. Specialization is obviously 
demanded by complexity and diversity. But there is no guarantee that disciplines 
will not evolve what Campbell characterized as parochial “tribal” norms and 
customs that impede scientific progress. His argument is illustrated with refer¬ 
ence to the arbitrary disciplinary boundaries, schools within disciplines, and the 
resulting “ethnocentrism” within the social sciences. Our impression is that 
the scientific endeavor becomes more prone to “ethnocentrism” as problems be¬ 
come more complex and diverse; certainly evolutionary biology, despite the 
unifying value of Darwinism, is not immune. As the enforcement of the uni- 
versalistic norms of scientific discourse weaken, very human motives, such as a 
desire for collegial relations within one’s discipline, a tendency to find that one’s 
extrascientific ideology can be squared one way or another with one’s science, 
career considerations, and a need to economize on information, can easily lead 
the social structure of science in directions that reduce its collective ability to 
solve complex-diverse problems. The mental effort of keeping multiple, partly 
conflicting, plausibility arguments in mind, the ambiguous relationship of these 
to ideas and norms derived from other roles, and the need to have some knowl¬ 
edge of several unfamiliar disciplines might be psychological motivations that 
encourage the formation of independent disciplines and schools with little com¬ 
munication between them. Nevertheless, it seems inescapable that complex- 
diverse subjects demand free communication between specialists and a wide 
tolerance for the pursuit of temporarily divergent plausibility claims. 

Deriving norms from this diagnosis is by no means straightforward. Perhaps 
new disciplines and new ideas need a measure of isolation, which the develop¬ 
ment of ethnocentric and sectarian attitudes affords (Campbell, 1985; Beatty, 
1987). On the other hand, unchecked, this process can result in a declaration of 
independence for a mature discipline, such as Sahlins offers for anthropology, 
which may be wholly harmful. There may be an optimal amount of disciplinary 
and research program “ethnocentrism” for maximizing scientific progress at any 
given time. 

Nonetheless, we think that the following two norms would, if adopted, 
improve scientific debate surrounding complex, diverse subjects. 

Ad hominem attacks on particular positions and the use of self-serving 
programmatic claims should be viewed as tacky. Given the deep importance of 
human behavior to humans, the weakness of the consumer/acceptor mechanism 
for regulating academic discourse, and the fact of the evolution of “ethnocen¬ 
tric” norms within disciplines, it is utopian to expect that the temptation to 
behave in such ways will always be resisted, particularly by those who are le¬ 
gitimately pursuing a position. Widespread agreement that such behavior is 
moderately offensive is a practical norm perhaps and might help to further 
productive debate over real issues. 

Scientists should be encouraged to take a sophisticated attitude toward 
empirical testing of plausibility arguments (Quinn and Dunham, 1983; Dia¬ 
mond, 1986). Folk Popperism among scientists has had the very desirable re¬ 
sult of reducing the amount of theory-free descriptive empiricism in many 
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complex-diverse disciplines, but it has had the undesirable effect of encouraging 
a search for simple mutually exclusive hypotheses that can be accepted or re¬ 
jected by single experiments. By our argument, very few important problems in 
evolutionary biology or the social sciences can be resolved in this way. Rather, 
individual empirical investigations should be viewed as weighing marginally for 
or against plausibility arguments. Often, empirical studies may themselves dis¬ 
cover or suggest new plausibility arguments or reconcile old ones. 


Conclusion 

We confess to being somewhat puzzled by the debate between the “adapta- 
tionists” and their critics. We suspect that most evolutionary biologists and 
philosophers of biology on both sides of the dispute would pretty much agree 
with the defense of the simple models strategy presented here. To reject the 
strategy of building evolutionary theory from collections of simple models is to 
embrace a kind of scientific nihilism in which there is no hope of achieving an 
understanding of how evolution works. On the other hand, there is reason to 
treat any given model skeptically. As Kitcher (1987) notes, his criticisms of 
optimality arguments are not meant as “forlorn skepticism,” but rather as helpful 
“in pinpointing strategies for improving hypotheses about selective pressures 
and functional significance” (p. 99). Kitcher quite properly and quite explicitly 
calls attention to the fact that because diversity and complexity are real, the 
tactics of seeking understanding via simple models is something that must be 
done with care. No one ought to disagree. 

Unfortunately, the critics of "adaptationism” are not always as sophisticated 
as this; they sometimes seem to want to benefit rhetorically from a programmatic 
critique that implies scientific nihilism without having to face the real (and ex¬ 
tremely unpleasant) consequences of actually adopting it. It may be possible to 
defend the proposition that the complexity and diversity of evolutionary phe¬ 
nomena make any scientific understanding of evolutionary processes impossible. 
Or, even if we can obtain a satisfactory understanding of particular cases of 
evolution, any attempt at a general, unified theory may be impossible. Some 
critics of adaptationism seem to invoke these arguments against adaptationism 
without fully embracing them. The problem is that alternatives to adaptationism 
must face the same problem of diversity and complexity that Darwinians use the 
simple model strategy to finesse. The critics, when they come to construct 
plausibility arguments, will also have to use relatively simple models that are 
vulnerable to the same attack. If there is a vulgar sociobiology, there is also a vulgar 
criticism of sociobiology. Perhaps because we have devoted a considerable effort 
to building a plausibility argument for the novel and sometimes maladaptive role 
of culture in human evolution, we are very sensitive to the strength of the so- 
ciobiologists’ plausibility arguments and the weakness of most of the objections to 
them. 

In our opinion, human sociobiology has been a successful research program 
because it has made rather good use of the simple models strategy. Its practi¬ 
tioners have taken care to construct sound plausibility arguments and, in the 



SIMPLE MODELS OF COMPLEX PHENOMENA 417 


spirit of scientific pluralism, to use the work of social scientists. As pursuers of a 
somewhat narrow range of plausibility arguments, their work is not above crit¬ 
icism in detail or in general. As befits pursuers, they have usefully driven the 
fitness-optimizing postulate to extremes that are not likely to be ultimately 
warranted. Less usefully, they have used a burden-of-proof claim to attempt to 
insulate sociobiology from counterarguments. On the other hand, the attacks on 
sociobiology are a good source of negative object lessons. The criticism of human 
sociobiology has far too frequently depended on mere programmatic claims 
(often invalid ones at that, as when sociobiologists are said to ignore the im¬ 
portance of culture and to depend on genetic variation to explain human dif¬ 
ferences). These claims are generally accompanied by dubious burden-of-proof 
arguments. Some critics also show little sense of the importance of scientific 
pluralism. 


NOTE 

We thank D. T. Campbell, J. M. Diamond, J. M. Emlen, G. Macey, A. Rosenberg, 
E. A. Smith, J. Staddon, & S. Vail for comments on drafts of this chapter. We also 
benefited from conversations with J. Quinn and J. Griesemer. 
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20 Memes 

Universal Acid or a 
Better Mousetrap? 


Among the many vivid metaphors in Darwin's Dangerous Idea, one 
stands out. The understanding of how cumulative natural selection gives rise to 
adaptations is, Daniel Dennett says, like a "universal acid”—an idea so powerful 
and corrosive of conventional wisdom that it dissolves all attempts to contain it 
within biology. Like most good ideas, this one is very simple: once replicators 
(material objects that are faithfully copied) come to exist, some will replicate 
more rapidly than others, leading to adaptation by natural selection. The great 
power of the idea is that the resulting adaptations can be understood by asking 
what leads to efficient, rapid replication. Given that ideas seem to replicate, it is 
natural that Dawkins (1976, 1982), Dennett (1995), and others have explored 
the possibility of using this idea to explain cultural evolution. 

Natural selection was not Darwin’s only powerful, far-reaching idea. Ernst 
Mayr (1982) has argued that what he calls “population thinking” was also among 
Darwin’s foundational contributions to biology. Before Darwin, species were 
thought to be essential, unchanging types, like geometric figures and chemical 
elements. Darwin saw that species were populations of organisms that carried a 
variable pool of inherited information through time. To understand the evolution 
of species, biologists had to account for the processes that changed the nature of 
that inherited information. Darwin thought that the most important processes 
were natural selection, sexual selection, and the “inherited effects of use and dis¬ 
use.” We now know that the last process is not important in organic evolution— 
unlike Darwin, modern biologists do not believe that the sons of blacksmiths 
inherit their father’s mighty biceps. Nowadays biologists think many processes 
that Darwin never dreamed of are important, including segregation, recom¬ 
bination, gene conversion, and meiotic drive. Nonetheless, modern biology is 
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fundamentally Darwinian because its explanations of evolution are rooted in 
population thinking. If Darwin were to be resurrected tomorrow through some 
miracle of cloning, we think he would be quite happy with his legacy. 

In this chapter we want to convince you that population thinking, not 
natural selection, is the key to conceptualizing culture in terms of material 
causes. This argument is based on three well-established facts: 

1. There is persistent cultural variation among human groups. Any ex¬ 
planation of human behavior must account for how this variation 
arises and how it is maintained. 

2. Culture is information stored in human brains. Every human culture 
contains vast amounts of information. Important components of 
this information are stored in human brains. 

3. Culture is derived. The psychological mechanisms that allow culture 
to be transmitted arose in the course of hominid evolution. Culture 
is not simply a by-product of intelligence and social life. 

Much of culture is information stored in human brains—information that 
got into those brains by various mechanisms of social learning. It follows that 
to explain the distribution of information stored in the brains of the members 
of the current generation, any coherent theory will have to account for the 
cultural information in the brains of the previous generation. The theory will 
also have to explain how this information, together with genes and environ¬ 
mental contingencies, caused the present generation to acquire the cultural 
information that it did. Unfortunately, we do not understand how this process 
works. It may be that cultural information stored in brains takes the form of 
discrete memes that are replicated faithfully in each subsequent generation, or 
it may not. This is an empirical question that at present is unanswered, and 
we will see that other models are possible. In every case, the Darwinian popu¬ 
lation approach will illuminate the process by which the cultural information 
that is stored in a population of brains is transformed from one generation to the 
next. 

We also want to convince you that population thinking can play an im¬ 
portant, constructive role in the human sciences. The fact that population 
thinking is logically necessary for a natural, causal, theory of culture does not 
necessarily mean that such a theory will be useful. Thus, we know that human 
culture must be consistent with quantum mechanics, but it is unlikely that such 
a connection will help us understand, say, ethnic conflict. However, we think 
Darwinian models of culture are useful for two reasons. First, they serve to 
connect the rich models of behavior based on individual action developed in 
economics, psychology, and evolutionary biology with the data and insights of 
the cultural sciences, anthropology, archaeology, and sociology. In doing so, we 
think that they can help shed light on important unsolved problems in the social 
sciences. Second, population thinking is useful because it offers a way to build 
a mathematical theory of human behavior that captures the important role of 
culture in human affairs. Population thinking is not a universal acid that will 
dissolve existing social sciences. But it is a better mousetrap, providing useful 
new tools that can help solve outstanding problems in the human sciences. 
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Culture Is Heritable at the Group Level 

One of the striking facts about the human species is that there are important, 
persistent differences between human groups that are created by culturally 
transmitted ideas, not genetic differences, or differences in the physical or biotic 
environment. Sonya Salamon’s (1992) research on immigrant communities in the 
United States shows how cultural differences can give rise to different behaviors 
in the same environment. One of Salamon’s studies focused on two farming 
communities in southern Illinois. “Freiburg” (a pseudonym), is inhabited by the 
descendants of German-Catholic immigrants who arrived in the area during the 
1840s. “Libertyville” (also a pseudonym) was settled by people from other parts 
of the United States—mainly Kentucky, Ohio, and Indiana—when the railroad 
arrived in 1870. These two communities are only about 20 miles apart and have 
been carefully matched for similar soil types. 

The people in these two communities have different values about family, 
property, and farm practice, and these differences seem consistent with their 
ethnic origins. The farmers of Freiburg tend to value farming as a way of life, and 
they want at least one son or daughter to continue as a farmer. In Freiburg, wills 
specify that the farm will go to a child who will farm the land and use farm 
proceeds to buy out any nonfarming siblings. Parents put considerable pressure 
on children to become farmers. They place little importance on education, 
knowing that advanced education often results in young people not returning to 
the farm. Salomon argues that these "yeoman” values are similar to those ob¬ 
served among peasant farmers in Europe and elsewhere. In contrast, the "Yan¬ 
kee” farmers of Libertyville regard their farms as profit-making businesses. They 
buy or rent land depending on economic conditions, and if the price is right, they 
sell. Many Yankee farmers would prefer their children to continue farming, but 
they see it as an individual decision. Some families help their children enter 
farming, but many do not, and they generally place a strong value on higher 
education. 

The difference in values between Freiburg and Libertyville lead to mea¬ 
surable differences in farm practices despite the proximity of the two towns and 
the similarity of their soils. Farms are substantially larger in Libertyville—the 
mean size of farm operations in Libertyville is 518 acres compared to 276 acres 
in Freiburg. The Libertyville farms are larger because Yankee farmers rent more 
land. They rent more land because Yankees demand a higher income to stay in 
farming. Yeomen, who so value farming for its own sake, are content with lower 
incomes and fear the risks of debt-financed expansion. 

The two communities also show striking differences in farm operations. In 
Libertyville, as in most of southern Illinois, farmers specialize in grain produc¬ 
tion. It is the primary source of income for 77 percent of the farmers in Liber¬ 
tyville. In Freiburg, many people mix grain production with dairying or livestock 
raising, activities that are almost absent in Libertyville. Because animal husbandry 
is labor-intensive, it allows Germans to accommodate their larger families on 
their more limited acreage. Yankee farmers decided against dairying and stock 
raising because grain farming is more profitable and less work. 
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The fact that culturally distinctive human groups behave differently in the 
same environment implies that culture is heritable, at least at the group level. 
Many beliefs and values that are common in a group at one point in time are also 
common among the descendants of the same group. Any theory of how culture 
works must be consistent with this fact. It must explain why the German farmers 
of Freiburg hold different beliefs about life and land than their Yankee neighbors 
almost 150 years after leaving Europe. 


Culture Is Information in Stored Human Brains 

Every human culture contains an enormous amount of information. Consider 
how much information must be transmitted to maintain a particular distinctive 
spoken language. A lexicon requires something like 10,000 associations between 
words and their meanings. Grammar entails a complex set of rules regulating 
morphosyntax, and although it is unclear the extent to which these rules arise 
from innate, genetically transmitted structures, it is clear that the rules that 
underlie the grammatical differences that separate English and Chinese are cul¬ 
turally transmitted. Subsistence techniques also entail large amounts of infor¬ 
mation. For example, Blurton-Jones and Konner (1976) showed that the IKung 
San have a very detailed knowledge of the natural history of the Kalahari—so 
detailed, in fact, that the researchers were unable to judge the accuracy of much 
of IKung knowledge because in some aspects it exceeded Western biology. As 
anyone who has ever tried to make a decent stone tool can attest, the manu¬ 
facture of even the simplest tool requires lots of knowledge; more complex 
technologies require even more. Imagine the instruction manual for constructing 
a seaworthy kayak from materials available on the North Slope of Alaska. The 
institutions that regulate social interactions incorporate still more information. 
Property rights, religious custom, roles, and obligations all require a considerable 
amount of detailed information. 

The vast store of information that exists in every culture cannot simply float 
in the air. It must be encoded in some material object. In societies without 
widespread literacy, the most important objects in the environment capable of 
storing this information are human brains and human genes. It is undoubtedly 
true that some cultural information is stored in artifacts. It may well be that the 
designs that are used to decorate pots are stored on the pots themselves and that 
when young potters learn how to make pots they use old pots, not old potters, as 
models. In the same way, the architecture of the church may help store infor¬ 
mation about the rituals performed within. Without writing, however, the 
ability of artifacts to store culture is quite limited. First, many artifacts are very 
difficult to reverse-engineer. The young potter cannot learn how to select clay 
and temper or how to fire a pot by studying existing ones. Second, much cultural 
information is semantic knowledge—how can an artifact store the notion that 
Kalahari porcupines are monogamous? Or the rules that govern bride-price 
transactions? 

It is also clear that much cultural information is not stored in human genes. 
In one sense this is obvious. The evidence is very clear that very little cultural 
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variation results from genetic differences. We know that genetic differences do not 
explain why some people speak Chinese and others English, or why the IKung 
know a lot more about the biology of porcupines than most readers of this chapter. 

However, there is a subtle and much more plausible way that genes could 
store cultural information. It could be that most human culture is innate, ge¬ 
netically transmitted information that is evoked by environmental cues. Pascal 
Boyer (1994) argues that much of religious belief has this character. For ex¬ 
ample, the Fang, a group Boyer studied in Cameroon, have elaborate beliefs 
about ghosts. For the Fang, ghosts are malevolent beings that want to harm the 
living; they are invisible and can pass through solid objects, and so on. Boyer 
argues that most of what the Fang believe about ghosts is not culturally trans¬ 
mitted; rather, it is based on the innate, epistemological assumptions that un¬ 
derlie all cognition. Once a young Fang child learns that ghosts are sentient 
beings, she does not need to learn that ghosts can see or that they have beliefs 
and desires—these components are provided by cognitive machinery that reliably 
develops in every environment. According to this view, cultural differences arise 
because different environmental cues evoke different innate information. A 
friend of ours believes in angels instead of ghosts because he grew up in an 
environment in which people talked about angels. However, most of what he 
knows about angels comes from the same cognitive machinery that gives rise to 
Fang beliefs about ghosts, and the information that controls the development of 
this machinery is stored in the genome. 

This picture of culture is a useful antidote to the simplistic view that culture 
is simply poured from one head into another. Evolutionary psychologists are 
surely right that every form of learning, including social learning, requires an 
information-rich innate psychology and that much of the adaptive complexity 
we see in cultures around the world stems from this information. However, it 
is a big mistake to ignore transmitted cultural information. The single most 
important adaptive feature of culture is that it allows the gradual, cumulative 
assembly of adaptations over many generations—adaptations that no single in¬ 
dividual could invent on his own. Cumulative adaptation cannot be based solely 
on innate, genetically encoded information. 

Consider the evolution of a relatively simple form of technology, the mar¬ 
iners’ magnetic compass (Needham, 1978). First, Chinese geomancers noticed 
the peculiar tendency of small magnetite objects to orient in the earth’s magnetic 
field, an effect that they used for purposes of divination. Then, Chinese mariners 
learned that magnetized needles could be floated on water to indicate direction 
at sea. Next, over several centuries Chinese seamen developed a dry compass 
mounted on a vertical pin-bearing, like a modern toy compass. Europeans ac¬ 
quired this type of compass in the late medieval period. European seamen then 
developed the fixed card compass that allowed a helmsman to steer an accurate 
course by aligning the bow mark with the appropriate compass point. Compass 
makers later learned to adjust iron balls near the compass to zero out the 
magnetic influence from the ship and to gimbal the compass and fill it with 
liquid to damp the motion imparted to the card by the roll and pitch of the ship. 
Even such a relatively simple tool was the product of at least seven or eight in¬ 
novations separated in time by centuries and in space by the breadth of Eurasia. 
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This sort of adaptation occurs only because novel information can accumulate in 
human populations, be stored in human brains, and be transmitted through time 
by teaching and imitation. 

Evolutionary psychologists argue that our psychology is built of complex, 
information-rich, evolved modules that are adapted for the hunting and gathering 
life that we pursued until the origins of agriculture a few thousand years ago. On 
this argument, humans can easily and naturally do the things we are really adapted 
to do like learn a language or understand the feelings of others. Inventing complex 
modern artefacts like the compass is hard, but what about skills necessary for 
hunting and gathering? Couldn’t we learn these as easily as we learn language? 
Doesn’t our brain contain the information necessary to follow hunting and 
gathering ways? Our ancestors lived as hunter-gatherers of some kind for the last 
2 or 3 million years. If we had to do so, couldn’t we reinvent that stuff, just as Fang 
children invent the properties of their ghosts, or children can invent a grammar? 

Good questions, but we think the answer is almost certainly “Are you nuts?!” 
Consider the following thought experiment. Suppose you are stranded in some 
not-too-extreme desert environment, not the Empty Quarter or the Atacama, but 
the desert between Sonoita, Mexico, and Yuma, Arizona. Your task is to survive 
and raise your kids without modern technology. You will be given the resources to 
survive a few months to get your feet on the ground before we take away your last 
tin of food and your last steel tool—a little time to see what comes naturally. Will 
you make it? 

We don’t think so. The stretch between Sonoita and Yuma is known as El 
Camino del Diabolo, “the Devil’s Road.” It was one leg of the main overland 
route from Old Mexico to California until the coming of railroads. For more 
than a century it was used by Spanish, Mexican, and American travelers. To get 
that far, every traveler had to already be an experienced frontiers-person, and 
no doubt most were hardbitten, desert-wise, and well equipped with familiar 
technology. It was the best of several bad routes and was comparatively well 
known and well marked. Still, it was an infamous leg of the journey, and many 
travelers ended up in the hasty graves that litter the route. 

Now, consider that the Camino del Diabolo was also the home to Papago 
Indians who, with a few pounds of wood, stone, and bone equipment, an im¬ 
pressive amount of hard-won knowledge, and a well-adapted system of social 
institutions, lived and raised their children in the very same desert that killed so 
many pioneers. If our task was to survive in this desert without our accustomed 
industrial technology, we would certainly trade a few hours of tutoring by a 
traditional Papago for any number of months trying to summon an innate 
knowledge of the desert. 


Culture Is Derived 

Simple forms of social learning, often termed “protoculture,” occur in many 
other species of animals. In a review of the social transmission of foraging 
behavior, Levebre and Palameta (1988) give 97 examples of protocultural var¬ 
iation in foraging behavior in animals as diverse as baboons, sparrows, lizards, 
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and fish. Much of the evidence for protoculture in other animals consists of 
observations of different behavior by populations of the same species living in 
similar environments. For example, chimpanzees in the Mahale Mountains of 
Tanzania often adopt a unique grooming posture in which both partners extend 
one arm over their heads, clasp hands, and then groom one another’s exposed 
arm pits. These grooming hand-clasps occur often and are performed by all 
members of the group. Chimpanzees at Gombe, who live less than 100 kilo¬ 
meters away in a similar type of habitat, often groom but never perform this 
behavior. Sometimes scientists have observed the spread of a novel behavior. 
One famous example comes from Japan where a group of Japanese macaques, 
whose range included a sandy beach, were provisioned with sweet potatoes. 
A young female macaque accidentally dropped her sweet potato into the sea 
as she was trying to rub the sand off it. She must have liked the result, as she 
began to carry all of her potatoes to the sea to wash them. Other monkeys 
followed suit. JJowever, it took other members of the group quite some time to 
acquire the behavior and many monkeys never washed their potatoes. Finally, 
some evidence for protoculture in other animals comes from experiments that 
demonstrate that behavior is socially transmitted. The most famous case is the 
transmission of song dialects in birds like the white-crowned sparrow. 

There is little evidence, however, of cumulatively evolved cultural traditions 
in other species. With a few exceptions, social learning leads to the spread of 
behaviors that individuals could have learned on their own. For example, food 
preferences are socially transmitted in rats. Young rats acquire a preference for a 
food when they smell the food on the pelage of other rats (Galef, 1988). This 
process can cause the preference for a new food to spread within a population. It 
can also lead to behavioral differences among populations living in the same 
environment, because current foraging behavior depends on a history of social 
learning. However, it does not lead to the cumulative evolution of complex new 
behaviors that no individual rat could learn on its own. Thus, in other animals it 
is quite plausible that most of the detailed information that creates protocultural 
differences is stored and transmitted genetically. 

Circumstantial evidence suggests that the ability to acquire novel behaviors 
by observation is essential for cumulative cultural change. Students of animal 
social learning distinguish observational learning, which occurs when younger 
animals observe the behavior of older animals and learn how to perform a novel 
behavior by watching them, from a number of other mechanisms of social trans¬ 
mission, which also lead to behavioral continuity without observational learning 
(Galef, 1988; Visalberghi and Fragazy, 1990; Whiten and Ham, 1992). One 
such mechanism, local enhancement, occurs when the activity of older animals 
increases the chance that younger animals will learn the behavior on their 
own. Imagine a young monkey acquiring its food preferences as it follows 
its mother around. Even if the young monkey never pays any attention to what 
its mother eats, she will lead it to locations where some foods are common 
and others rare, and the young monkey may learn to eat much the same foods 
as mom. 

Local enhancement and observational learning are similar in that they 
can both lead to persistent behavioral differences among populations, but only 
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observational learning allows cumulative cultural change (Tomasello, Kruger, and 
Ratner, 1993). To see why, consider the cultural transmission of stone tool use. 
Suppose that occasionally early hominids learned to strike rocks together to 
make useful flakes. Their companions, who spent time near them, would be 
exposed to the same kinds of conditions, and some of them might learn to make 
flakes too, entirely on their own. This behavior could be preserved by local 
enhancement because groups in which tools were used would spend more time 
in proximity to the appropriate raw materials. However, that would be as far as 
tool-making would go. Even if an especially talented individual found a way to 
improve the flakes, this innovation would not spread to other members of the 
group because each individual learned the behavior anew, without any detailed 
guidance from innovators who have improved on the common technique. Local 
enhancement is limited by the learning capabilities of individuals and the fact 
that each new learner must start from scratch. With observational learning, 
on the other hand, innovations can be incorporated into others’ behavioral rep¬ 
ertoires if younger individuals are able to acquire the improved behavior by 
observational learning. To the extent that observers can use the behavior of 
models as a starting point, observational learning can lead to the cumulative 
evolution of behaviors that no single individual could invent on its own. 

Adaptation by cumulative cultural evolution is apparently not a by-product 
of intelligence and social life. Capuchin monkeys are among the world’s cleverest 
creatures. They resemble apes in having quite large brains for their size. In na¬ 
ture, they perform many complex behaviors, and in captivity they can be taught 
extremely demanding tasks. Capuchins live in social groups and have ample 
opportunity to observe the behavior of other individuals of their own species. 
Yet good laboratory evidence indicates that these monkeys make little or no use 
of observational learning (Visalberghi and Fragazy, 1990). Observational learn¬ 
ing is not simply a by-product of intelligence and the opportunity to observe 
conspecifics. Rather, it seems to require special psychological mechanisms 
(Bandura, 1986). This conclusion suggests that the psychological mechanisms 
that enable humans to learn by observation are adaptations that have been 
shaped by natural selection in the human lineage because culture is beneficial. 


Cultural Evolution Is Darwinian 

Now, let us consider what these facts imply for a theory of culture. Consider a 
population of individuals who are culturally interconnected; they speak dialects 
of a single language, use similar technology, share relatively similar beliefs about 
the world, and have similar moral values. People in this population think and 
behave differently from other peoples, in part, because they have different 
culturally transmitted information stored in their brains. Next consider the 
descendants of this population, say 100 years later. The culture of the descen¬ 
dant population will be similar in many ways to that of their predecessors. Their 
language will be similar, and they may often use similar technology, have similar 
beliefs about the world, and subscribe to a similar moral system. The fact that 
culture depends on behavior stored in the brains of this population requires us to 
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account for how the information that generates these similarities was transmit¬ 
ted from the brains in the first population to the brains in the second. 

Of course, there will also be differences between the two populations, some 
small, some great. Some of these differences will arise because some behaviors 
are more common in the second population—for example, perhaps what was 
previously a rare usage or form of pronunciation has become common. Other 
differences will arise because genuinely new behavior is present, either as a result 
of borrowing from neighboring populations or due to genuine innovation. Thus, 
a complete theory would also have to account for why some forms of cultural 
information spread, and why some forms have diminished, and how innovation 
occurs. 

Cumulative cultural change requires observational learning. People observe 
the behavior of others, and (somehow) acquire the information necessary to 
produce a reasonable facsimile of the same behavior. In any given time period, 
each person observes only a sample of the people who make up his population. A 
very small child is exposed mainly to the people in her family, older children are 
exposed to peers and teachers, and adults to yet a wider range of people. We will 
refer to this group of people as an individual’s “cultural sample.’’ For most of 
human history cultural samples were small, but nowadays they may be immense. 
On the other hand, for some elements of culture many people may be dispro¬ 
portionately influenced by a single charismatic leader or acknowledged expert. 

The fact that cultures often persist over time with little change means that 
the commonness of a behavior in an individual’s cultural sample must have a 
positive effect on the probability that the individual ultimately acquires the 
cultural information that generates that behavior. Such a tendency could arise in 
several different ways: if observational learning takes the form of approximately 
unbiased copying, then common behaviors will be more frequent in cultural 
samples, and therefore will be more likely to be copied. It could also be that the 
psychology of observational learning itself predisposes people to acquire more 
common behaviors. Finally, it could be that rare behaviors are typically disad¬ 
vantageous and less likely to be retained as a result of individual learning and 
experimentation, or even by natural selection against them. 

It follows that cultural change is a population process. The argument pro¬ 
ceeds in several steps: 

• To understand how a person behaves, we have to know the nature of 
the information stored in her brain 

• To understand why people have the beliefs that they do, we must 
know what kinds of behaviors characterized their cultural sample 

• To predict the distribution of cultural samples that exists, one must 
know the cultural composition of the population 

• Therefore, to understand how people behave, we must understand 
why the population has the cultural composition that it does 

Similarities between descendant and ancestral populations arise because the 
necessary information has been transmitted from individual to individual 
through time without significant change. Differences occur because some var¬ 
iants have become more common, others have become more rare, and some 
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completely new variants have been introduced. Thus, to account for both con¬ 
tinuity and change we need to understand the population processes by which 
ideas are transmitted through time. 


Culturally Transmitted Skills and Beliefs May 
Not Be Replicators 

In The Extended Phenotype, Richard Dawkins (1982) argues that the cumulative 
evolution of complex adaptations requires what he calls replicators, things in the 
physical world that produce copies of themselves and have the following 
three additional properties: 

1. Fidelity. The copying must be sufficiently accurate that even after a 
long chain of copies the replicator remains almost unchanged. 

2. Fecundity. At least some varieties of the replicator must be capable 
of generating more than one copy of themselves. 

3. Longevity. Replicators must survive long enough to affect their own 
rate of replication. 

Replicators give rise to cumulative adaptive evolution because replicators 
are targets of natural selection. Genes are replicators—they are copied with 
astounding accuracy, they can spread rapidly, and they persist throughout the 
lifetime of an organism, directing its machinery of life. Dawkins thinks that 
beliefs and ideas are also replicators. On the face of it, this is an apt analogy. 
Beliefs and ideas can be copied from one mind to another, spreading through a 
population, controlling the behavior of people who hold them. 

But there are reasons to doubt that beliefs and skills are replicators, at least 
in the same sense that genes are. Unlike genes, ideas are not copied and trans¬ 
mitted intact from one brain to another. Instead, the information in one brain 
generates some behavior; somebody else observes this behavior and then (some¬ 
how) creates the information necessary to generate very similar behavior. The 
problem is that there is no guarantee that the information in the second brain is 
the same as the first. For any phenotypic performance, there are potentially an 
infinite number of rules that would generate that performance. Information will 
be transmitted from brain to brain only if most people induce a unique rule from 
a given phenotypic performance. While this may often be the case, it is also 
plausible that genetic, cultural, or developmental differences among people may 
cause them to infer different beliefs from the same overt behavior. To the extent 
that these differences shape future cultural change, the replicator model captures 
only part of cultural evolution. 

The generativist model of phonological change illustrates the problem. Ac¬ 
cording to the generativist school of linguistics, individual pronunciation is gov¬ 
erned by a complex set of rules that takes as input the desired sequence of words 
and produces as output the sequence of sounds that will be produced (Bynon, 
1977). Generativists also believe that, as adults, people can modify their pro¬ 
nunciation only by adding new rules that act at the end of the chain of existing 
rules. Children, on the other hand, are not constrained by the rules used to 
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generate adult speech. Instead, they induce the simplest set of grammatical rules 
that will account for the performances they hear, and these may be quite different 
than the rules used by adult speakers. Although the new rules produce the same 
performance, they can have a different structure and, therefore, allow further 
changes by rule addition that would not have been possible under the old rules. 

The following example (from Bynon, 1977] illustrates this phenomenon. In 
some dialects of English, people pronounce words that begin with wh using what 
linguists call an “unvoiced” sound while they pronounce words beginning with 
w using a voiced sound. (Unvoiced sounds are produced with the glottis open, 
resulting in a breathy sound, whereas voiced sounds are produced with the 
glottis closed, causing a resonant tone.] People who speak such dialects must 
have mental representations of the two sounds and rules to assign them to 
appropriate words. Now suppose that people who speak such a dialect come into 
contact with other people who only use the voiced w sound. Further suppose 
that this second group of people is more prestigious, and accordingly people in 
the first group modify their speech so that they too use only voiced ws. Ac¬ 
cording to the generativists, they will accomplish this change by adding a new 
rule that says “voice all unvoiced ws." So, Larry wants to say Whether it is better 
to endure. The part of his brain that takes care of such things looks up the mental 
representations for each of the words, including whether, which has an unvoiced 
w (because that is the way Larry learned to speak as a child]. Then after any 
other processing for stress or tone, the new rule changes the unvoiced w in 
whether to a voiced w. Children learning language in the next generation never 
hear an unvoiced w, and, according to generativists, they adopt the same under¬ 
lying representation for whether and weather. Thus, even though there is no dif¬ 
ference in the phenotypic performance among parents and children, children do 
not acquire the same mental representation as their parents. This difference may 
be important because it will affect further changes. For example, it might make 
it less likely that the two sounds would split again in the future. The adult 
version of the rule still has a latent distinction between the voiced and unvoiced 
pronunciation that could serve as the basis for renewing the distinction, whereas, 
if the generativists are correct, the latent distinction is unavailable to child 
learners who hear only one usage. 


Replicators Are Not Necessary for Cumulative 
Adaptive Evolution 

We also doubt that replicators are necessary for the cumulative evolution of 
complex features. Here is an example of a transmission system that does just 
that. When you speak, the kind of sounds that come out of your mouth depends 
on the geometry of your vocal tract. For example, the consonant p in spit is 
created by momentarily bringing your lips together with the glottis open. Nar¬ 
rowing the glottis converts this consonant to b as in bib. Leaving the glottis open 
and slightly opening the lips produces pf, as in the German word apfel (apple]. 
Linguists have shown that even within a single speech community individuals 
vary in the exact geometry of the vocal tract used to produce any given word. 
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Thus, it seems plausible that individuals vary in the culturally acquired rule 
about how to arrange the inside of the mouth when they are speaking any par¬ 
ticular word. Languages vary in the sounds used and this variation can be very 
long-lived. For example, in dialects spoken in the northwest of Germany, p is 
substituted for pf in apfel and many similar words. This difference arose about AD 
500 and has persisted ever since (Bynon, 1977]. 

So how are different rules governing speech production transmitted from 
generation to generation? Consider two models. 

First, suppose that each child learning language is exposed to the speech of a 
number of adults. These adults vary in the way that they produce the pf sound in 
apfel. Each child figures out how she would need to position her tongue to 
produce the same p/sound as each adult model, and then she adopts one of these 
as her own rule. Here, a mental rule that governs speech production is trans¬ 
mitted from one individual to another. The mental rule is a replicator; it clearly 
has fidelity. It has longevity because it potentially persists for generations, and it 
would have fecundity if the rule was more attractive than competing rules. And 
because it is a replicator, it can evolve. 

Now consider a second model. As before, children are exposed to the speech 
of a number of adults who vary in the way that they pronounce pf. Each child 
unconsciously computes the average of all the pronunciations that he hears and 
adopts the tongue position that produces this average. Here, mental rules are not 
transferred from one brain to another. The child may adopt a rule that is unlike 
any of the rules in the brains of his models. The rules in particular brains do not 
replicate because no rule is copied faithfully. The phonological system can 
nonetheless evolve in a quite Darwinian way. More attractive forms of pro¬ 
nunciation can increase if they have a disproportionate effect on the average. 
Rules affecting different aspects of pronunciation can recombine and thus lead to 
the cumulative evolution of complex phonological rules. It is true that the act of 
averaging will tend to decrease the amount of variation in the population each 
generation. However, phenotypic performances will vary as a result of age, social 
context, vocal tract anatomy, and so on. Learners will often misperceive an 
utterance. These sorts of errors in transmission will keep pumping variation into 
a population as averaging bleeds it away. In fact, averaging might be necessary to 
prevent high noise levels from injecting too much variation into the population 
(see Cavalli-Sforza and Feldman, 1981; Boyd and Richerson, 1985). 

There are still other possibilities that differ even more radically from the 
replicator model. For example, a propensity to imitate the common type in the 
population can be coupled with high rates of individual learning to create a model 
in which there is little heritable variation at the individual level, but substantial 
heritability of group differences (Henrich and Boyd, 1998). In such a model 
the cumulative evolution of adaptive complexity can occur, and occur rapidly, 
through selective processes that act at the group level (Boyd and Richerson, 1990, 
2002). Similarly, in recent models of the evolution of social institutions (Young, 
1998), there is no cultural transmission at the individual level. Although in¬ 
dividuals simply acquire the best response to their social environment by trial- 
and-error learning, the structure of social interactions creates persistent, heritable 
variation at the group level. 
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We do not understand in detail how culture is stored and transmitted, so we 
do not know whether culturally transmitted ideas and beliefs are replicators or 
not. If the application of Darwinian thinking to understanding cultural change 
depended on the existence of replicators, we would be in trouble. Fortunately, 
culture need not be closely analogous to genes. Ideas must be gene-like to the 
extent that they are somehow capable of carrying the cultural information nec¬ 
essary to give rise to the cumulative evolution of complex cultural patterns that 
differentiate human groups. They exhibit the essential Darwinian properties of 
fidelity, fecundity, and longevity, but, as the example of phonemes shows, this 
can be accomplished by a most ungene-like, replicatorless process of error-prone 
phenotypic imitation. All that is really required is that culture constitutes a sys¬ 
tem maintaining heritable variation. 


Darwinian Models Are Useful 

Science on the frontier often has an anarchic, nervy flavor because it must deal 
with multiple uncertainties. Of course, we would be better off knowing exactly 
what memes are. Papering over the uncertainties of how culture is stored and 
transmitted no doubt leads to errors and conceals areas of fruitful inquiry. But as 
the psychologists explore one part of the frontier, the evolutionists should probe 
others. Studying the population properties of cultural information has lots of 
implications for human cognitive psychology, and vice versa. For example, when 
a child has the chance to copy the behavior of several different people, does she 
choose a single model for a given, discrete cultural attribute? Or does she av¬ 
erage, or in some other way combine, the attributes of alternative models? The 
minute you try to build a population model of culture, you see that this question 
is crucial. However, despite conducting thousands of experiments on social 
learning, psychologists apparently have never thought to answer this question. 
Just as at a four-way stop, it makes no sense for everyone to wait for everyone 
else. Watch what the other drivers are doing, certainly, but go whenever the road 
ahead is clear. 

Many social scientists have reacted to the advent of Darwinian models of 
culture with palpable distaste (e.g., Hallpike, 1986), while others have embraced 
these ideas with enthusiasm (e.g., Runciman, 1998). Much of this variation can 
be explained by people’s feelings about the current Balkanization of the social 
sciences. The world of social science is divided into self-sufficient “ethnies” like 
anthropology and economics that are content to follow the questions and pre¬ 
suppositions that govern their discipline. The inhabitants of this world regard 
other disciplines with a mixture of fear and contempt and take little interest in 
what they have to say about questions of mutual interest. Clearly, this is not a 
satisfactory state of affairs. 

We believe that Darwinian models can help rectify this problem. Disciplines 
such as economics, psychology, and evolutionary biology take the individual as 
the fundamental unit of analysis. These disciplines differ about how to model the 
individual and his psychology, but because they have the same fundamental 
structure, there has been much substantive interaction between them. Nowadays, 
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many economists and psychologists work closely together, and a rich new body 
of work, often called “behavioral economics,” has rapidly become mature 
enough to be applied to important practical problems such as the effect of 
retirement accounts on national savings rates. In the same way, economists and 
evolutionary biologists have found it relatively easy to work together on evolu¬ 
tionary models of social behavior, a rapidly growing field in both disciplines. 

Other disciplines like cultural anthropology and sociology emphasize the 
role of culture and social institutions in shaping behavior, and researchers in 
sociology, anthropology, and history find interaction with each other relatively 
comfortable. Bridging the gap between the individual and cultural disciplines has 
proved much more difficult. Darwinian models are useful precisely because they 
incorporate both points of view within a single theoretical framework in which 
individuals and culture are articulated in a way that captures some, if not all, of 
the properties that their respective specialists claim for them. In population- 
based models, culture and social institutions arise from the interaction of in¬ 
dividuals whose psychology has been shaped by their social milieu. As a bonus, 
Darwinian models come with tools to investigate the population-wide, long¬ 
term consequences of the interactions between individuals and their culture and 
social institutions. 

To see how useful population-based models can be, consider the problem of 
human cooperation. There is no coherent explanation for the vast scale of co¬ 
operation in contemporary human societies, or why the scale of cooperation has 
increased many 1000-fold over the last 10,000 years. Models in economics and 
evolutionary biology predict that cooperation should be limited to small groups 
of relatives and reciprocators. Many theories in anthropology simply assume 
(often implicitly] that cooperative societies are possible and that culturally 
transmitted beliefs and social institutions serve the interest of social groups, but 
no attempt is made to reconcile this assumption with the fact that people are at 
least partly self-interested. Darwinian models provide one cogent mechanism to 
explain human cooperation by identifying the conditions under which groups will 
come to vary culturally and predicting when such variation will lead to the spread 
of culturally transmitted beliefs that support large-scale cooperation (Soltis, 
Boyd, and Richerson, 1995]. In such models, the effect of different culturally 
transmitted beliefs on group prestige and group survival shapes the kinds of 
beliefs that survive and spread. These group-level effects in turn influence what 
people want and what they believe and, therefore, their behavior. Other recent 
work on the evolution of institutions (Young, 1998; Richerson and Boyd, 2002] 
makes us optimistic that Darwinian models may have widespread utility. 

Population thinking is also useful because it offers a way to build mathe¬ 
matical theory of human behavior that captures the important role of culture in 
human affairs. Mathematical theory has the great advantage of allowing con¬ 
clusions to be reliably deduced from assumptions. Experience in economics and 
evolutionary biology also suggests that it leads to a kind of clear understanding 
that is difficult to achieve with verbal reasoning alone. Of course there is also a 
cost—mathematical theory is necessarily based on simplified models. However, 
the combination of mathematical and verbal reasoning is superior to either 
alone. 
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Memes are not a universal acid, but population thinking is a better 
mousetrap. Population modeling of culture offers social science useful concep¬ 
tual tools and handy mathematical machinery that will help solve important, 
long-standing problems. It is not a substitute for rational actor models, or careful 
historical analysis. But it is an invaluable complement to these forms of analysis 
that will enrich the social sciences. 
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260-70 

Coordination, 85, 119 
tends to produce multiple stable 
equilibria, 300-1 
Corvids, 79 

Costly information. See Information, costly 
Costly signals, 270-1 
Cuisine, 99 

Cultural adaptation rapid and cumulative, 

143, 261 

Cultural component 
definition, 326 

historical linguistic examples of, 327-8 
Cultural descent with modification, 54 
Cultural ecologists, 5 
Cultural evolution 
definition of, 274-5, 339 
derived from Bayesian assumptions, 376 
dynamics of innovation, 353-5 
impossible to control, 269 


intellectual history of, 7 
is a population process, 428-9 
is Darwinian, 427-9 
isolating processes, 311 
Lamarckian, 256 

mechanisms leading ESS amount of 
imitation, 391-3 
origins of agriculture as a natural 
experiment in, 338 
path dependence important, 299 
processes regulating rate of, 360-1 
rapid, 4 

rate of, 143, 346 

sketch of Darwinian theory of, 399-402 
synthetic role of theory of, 433 
theory as a plausibility argument, 412 
theory of as tools for historians, 285 
timescales of, 355 

ultimate versus proximate role for, 259 
understanding using Darwinian methods, 
287 

Cultural explanations, prejudice against, 6 
Cultural group selection, 17, 134, 198, 
260-4, 274, 433 

and evolution of altruistic punishment, 
241-9 

how works, 206-8 

payoff based imitation form fast, 91-5, 
141-3, 229-38 

rate of in New Guinea highlands, 143, 
204, 218-21 

rate relatively slow, 228-9 
roles of fast and slow forms of, 239 
spreads cooperation and punishment, 192 
on subgroups within a society, 221 
Cultural inertia, 380 

Cultural meaning as force for coherence, 320 
Cultural phylogenies, 284 
in assemblages of coherent units, 318 
cultures as species, 317-8 
current practice for reconstructing, 324-6 
in hierarchically integrated cultural 
systems, 318 
reconstructing, 317-32 
when cultures are collections of 
ephemeral entities, 318-9 
Cultural recombination, 236-8 
difficulty for maintaining cultural 
coherence, 327 
Cultural transmission, 56-7 
component of model of ethnic 
boundaries, 111 
empirical evidence for, 208-22 
evolution of psychological capacities 
for, 58 

Cultural variation, 270, 421 
can respond to group selection, 134 
decision-making maintains, 220 
definition of, 53 

farming practices as example of, 422 
maintained by social enhancement versus 
imitation, 44 
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maintenance of group level, 206-7, 

213-4, 261 

model assumptions supported, 218 
not environmental variation, 53-4 
Culture 

allows humans to transcend evolutionary 
imperatives, 103 
analogy with genes, 377, 399 
can “domesticate” genes, 401 
can only be understood historically, 287 
common in nature, 52 
complexity of human traditions, 77 
creates novel evolutionary tradeoffs, 8 
a Darwinian evolutionary system, 4 
definition, 3, 6, 105, 252, 287 
derived not ancestral system, 425-7 
evolutionarily active, 104 
evolution of capacities for, 9, 52-3, 104 
as evolving (review of processes) 258-9 
how increases average fitness, 39-44 
improves human adaptability, 36-51 
information stored in human brains, 421, 
423-5 

maintains heritable variation, 432 
meaning as a force for coherence, 320 
neither autonomous nor prisoner of 
genetic constraints, 103 
not necessarily replicated, 429 
origins of, 399 
in other animals, 53-6, 76-8 
part of human biology, 4 
population thinking necessary to 
understand, 421 

as powerful adaptive system, 10 
role in evolution of human cognition, 70 
role of innate information in, 424 
a system of inheritance, 103, 389, 399, 422 
Cultures 

core principles, ultimate sacred 

postulates, or root paradigms of, 320 
history of cultures not “pure” 
hypotheses about structure of, 317-9 
in non-human animals, 425-7 
of organizations, 380 
phylogenies of, 284, 317-32 
population-like versus species-like, 311, 
333 

Cumulative cultural evolution, 49-50, 52, 
100 

adaptation to climate chaos, 143 
fast and frugal heuristics and complex 
adaptive behavior, 191 
important in humans, 54-5, 424, little 
evidence for in non-humans, 54 
origins of, 104 
Cushitic languages, 328-9 

Darwinian methods for study of cultural 
evolution, 258-9, 287, 339, 

378, 400 

Darwinian review of models of cultural 
evolution, 288-90 


Darwinian social science, 6 
Darwinian theory both scientific and his¬ 
torical, 287-8 
Datoga people, 329 

Decision-making forces, 290, 400. See also 
Biased cultural transmission 
Decision theory, 20, 33 
Democracy, 270 
Demographic transition, 366 
Denmark, spread of agriculture in, 360 
Descent in cultural evolution. See also 

Phylogenetic reconstruction of cultural 
descent 

Bantu political traditions as case, 325-6 
comparison of core and small unit 
hypotheses, 330 

of core traditions: evidence for, 322-6 
cultures as wholes: evidence for, 322 
Indo-European historical linguistics as 
case, 325 

mechanisms causing longevity and 
coherence, 319-22 
of memes, 331 

of small cultural components: evidence, 
326-30 

when impossible or uninteresting, 331-2 
Descent in organic evolution, 312-6 

common properties of genes and species, 
316-7 

of genes, 312-3 
of species, 315 

when phylogenies reticulated, 313-5 
Design complexity 

IBM 370 microprocessor as example, 
296-7 

number of qualitatively distinct 
equivalent optima, 296-9 
very large number of local optima, 

293-5 

Design tradeoffs in evolution of minds, 73 
Development, role of environment, 8 
Developmental constraints on responses to 
selection, 298 
Dialect, 99, 108 
Dialect evolution, 268 
Diffusion of innovations, 107, 323 
Dinka people, 221, 261 
Divergent evolution, versus convergent 
evolution, 292 

Diversity of cultural and natural processes, 
398 

favors use of toolkit of simple models, 
402-4 

Domestication 
bean, 357 
goosefoot, 356 
maize, 356-7, 360 
root crops, 357 
squash, 357 
sunflower, 356 
Dress, 99 

Dual inheritance theory, 103 
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Dugum Dani people, 212 
Dynamism of plant and beetle populations 
in Pleistocene epoch, 345 

Eastern North America, spread of 
agriculture in, 356 
Eastern Woodland societies, 344 
Economic inequality, 266 
Economists, 376 
Efficiency, definition of, 364-5 
Egalitarian societies, 266 
Empirical tests of simple models, 404 
Encephalization, 68, 76, 79 
Engineers, 376 
English language, 327-8, 424 
Environment 

dimensionality very large, 70 
novel, 70 

Environmental variability. See also 
Pleistocene climates 
cultural adaptations to, 15 
extreme in glacial periods, 17 
favors evolution of social learning, 25-9, 
32 

Ethnic boundaries 

predictions about the nature of, 129-30 
testing model of, 113 
Ethnic markers 

acting to isolate cultures, 311 
requirements for the evolution of, 

122-8 

Ethnicity, 99, 118 
example of entwining of genes and 
culture, 104 
Ethnocentrism, 100 
of scientific disciplines, 415, 432 
Eurasia, isolation of cultures in, 334 
Europe, spread of agriculture in, 338, 359 
European nation-states, 266 
Evoked culture, 70 
Evolution 

as always multilevel, 256-8 
of genes controlling social learning, 24-32 
of complex cognition, 66 
logic of genetic and cultural similar, 
255-6 

of social learning, 21-32 
of social organization slow, 345 
of tribal social instincts, 260-4 
in variable environments, 25-9 
Evolutionarily stable strategy approach, 24 
Evolutionary biologists, 376 
Evolutionary biology as source of concepts 
and methods to study cultural 
evolution, 105 

Evolutionary equilibrium for reciprocity in 
large groups, 153-7 

Evolutionary mechanisms, as generating 
historical contingency, 284 
Evolutionary psychology, 424-5 
extreme version of information-rich 
modules argument implausible, 425 


Evolutionary social science, as a 
methodology consistent with 
many theories, 259-60 
Evolutionary social scientists on culture, 8 
Evolutionary theory 
as accounting system, 6 
of culture, 255-6 
not reductionistic, 377 
recursive and multi-level, 255-8 
Experimental games, 271 
Explanations, hard versus soft, 6 
Exploitative elites, 266 
External versus internal explanations, 16 

Faiwolmin tribal area, 214-5 
Family level societies, 262. See also 
Small-scale societies 
Fang people, 424 

Female circumcision (genital mutilation) 
328,329 

Fertile crescent, spread of agriculture in, 350 

Fish, 54, 294, 425 

Fitness 

malignant functions, 298 
maximizing models example of 
generalized sample theory, 405 
peak shifting on complex topographies, 
299-300 

topography metaphor, 295-7 
Flemish language group, 217 
Florida climate record, 343 
Folk theorem, 84, 135, 139, 178 
Folk wisdom, 394-5 
Food preferences socially transmitted in 
rats, 54 

Food taboos, 206 
Fore, 210, 212, 216 
France, Upper Paleolithic societies 
in, 262 

Free riders, second order, 140, 189, 247 
Freemasonry, 329 
French language, 328 
Functionalism, 84, 204, 251, 291 
ahistorical, 294 
limits to, 220 
Fundamentalist sects, 268 

Gahuku people, 212 
Galton’s problem, 332 
Game theory, 95 
Gebusi people, 216, 330 
Gene flow and phylogeny, 314 
Gene-culture coevolution, 4-5, 9, 116, 144, 
199-200, 254 

builds cultural imperatives into the genes, 
264 

genes “domesticated” by culture, 401 
led to evolution of tribal social instincts, 
263-4 

reshaped human nature, 270 
Generalized sample theories, 404-6 
kin selection an example, 410 
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General systems, versus purpose learning 
systems, 71-5 

Generativist model of phonological change, 
430-1 

Generous tit-for-tat, 138 
Genes 

affecting cultural transmission, 58 
analogy with culture shallow, 377 
prosocial selected by cooperative cultural 
norms, 199-200 

Genetic constraints on responses to 
selection, 298-9 
Genetic value defined, 306 
Genetic variation, 53, 270 
German language, 327-8 
Germany, growth of democracy in, 270 
Ghosts, 424 
Global warming, 362 
Goeammer people, 216 
Gombe, chimpanzees at, 426 
Great Basin, 262-3, 333 
Great Plains, ecology and ceremony in, 

333 

Greco-Roman urban civilization imposed 
upon barbarians, 321 
Greek city-states, 267 
Greenland, climate variation in, 344 
Greenland ice cores, 340-4, 349 
Group beneficial strategy, conditions for 
spread, 233-6 

Group boundaries. See also Ethnic 
boundaries and related topics 
differences strongest at, 126 
permeable, 99 

Group explanations, versus individual level 
explanations, 5 
Group extinction, 209 
Group level cultural recombination, 

236-8 

Group selection, 141-3, 257. See also 
Cultural group selection 
cultural versus genetic, 249 
Darwin’s tribal group selection 
hypothesis, 251 
evidence against, 260 
interdemic, 241 

thought plausible in human case by 
prominent evolutionists, 275 
Gunwingga people, 275 
Gusii people, 329 

Han China, 266 
Hapsburgs, 329 

Hawaii, population build-up in, 359 
Heuristics, 17, 70, 71 
Himalayas climate record, 343 
Historical change 

defined as divergence in similar 
environments, 292-3 
defined as non-stationary, 291-2 
and phylogenetic reconstruction, 310 
product of chaotic dynamics, 303-3 


product of developmental or genetic 
constraints, 298-9 

product of evolution on rough fitness 
topographies, 294-7, 299-300 
product of multiple stable equilibria 
(see also Folk theorem) 300-1 
product of random forces, 293 
slow change requires evolutionary 
explanation, 221 
versus general laws, 283 
Historical linguistics, 284, 310 
wave versus genetic models of linguistic 
evolution, 330 

Historical traces, longevity of, 319-20 
Historical versus scientific explanation, 283, 
288, 290-1, 303-6 
cannot be disentangled as separate 
enterprises, 304 
dichotomy false, 291 
Holism, 320 

Holocene epoch, 67, 339, 340, 348-9, 362 
Housecats, 76 
Huli people, 212 
Human sociobiology 
debate about, 103 

depends upon decision-making forces, 
290, 400 

example of a plausibility argument, 
411-2 

subject of dubious programmatic attacks, 
413 

a successful research program, 416-7 
as theory of utility functions, 395 
useful exploration of the limits of fitness 
optimizing models, 409-10 
Humans, wide range compared to other 
primates, 10 

Human uniqueness, 4, 133 
Hunting and gathering, 273 
frontiers with agriculturalists sometimes 
stabilize, 360 

made efficient use of plants in Holocene, 
348-9 

persisted unusually long in western North 
America, 339 
Pleistocene, 345 

relation to origins of agriculture, 357 
use of plants in Pleistocene, 358-9 
Huron society, 344 
Hybrid zone, 316 

Hypothesis testing versus testing plausibility 
arguments, 410-2 

IBM 370 microprocessor, 295 
Ice cores, 17, 67, 340-1 
Ideal types, 397 
Ilaga Dani people, 212 
Imagined communities, 268 
Imitation. See also Observational 
learning 

allows cumulative improvement, 42-4 
allows selective learning, 40-2 
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capacity for cannot increase when rare, 
59-61 

comparative study in chimpanzees and 
children, 77-8 
definition, 44 

evolutionary equilibrium amount of, 41 
experimental evidence for in non-human 
animals, 56 

must have arisen by natural selection, 76 
requires special-purpose cognitive 
machinery, 60 
true, definition, 54-5, 77 
why adaptive, 85-9 
Imitators versus learners, 15 

fitnesses in Rogers’s model, 36-40 
Imprinting, 74 

Inclusive fitness. See Kin selection 
Indian caste system, 330 
Indians, of Western North America, 217 
evolution of Plains culture after 
introduction of horses, 219 
Northwest Coast, 262 
Indirect reciprocity, 146, 257, 270-1 
language and, 275 
Individual learning. See Learning, 
individual 

Individual level processes, as not explaining 
historical change, 221 
Indo-European expansion, 333 
Indo-European language reconstruction, 

325 

Information, 3 

costly, 8-9, 17, 379, 391-2, 410, 412 
costs leading to alternative plausibility 
argument, 412 
imperfect, 20, 391-2, 410 
innate, 70 

large reservoirs of information, 423 
noisy, 14 

prestige and conformity biases 
adaptations to uncertainty of, 232 
Inheritance of acquired variation, 14 
Inherited habits, 289 

Innate programming versus individual and 
social learning, 75-7 
Institutions 
definition, 253 
evolution of, 260-70 
highly variable, 253-4 
important in human behavior, 253 
product of cultural evolution, 253-5 
tribal scale, 262-3 
Intelligence, 69, 78 

Interdisciplinary study, social sciences as 
insufficient in, 375 
Inuit people, 54 

Irian Jaya, ethnographic data from, 205, 217 

Islam, 329 

Italy 

civic institutions in Northern versus 
Southern, 270 
climate change in, 343 


Jale people, 212 

Japan, spread of agriculture in, 356 

Japanese macaque potato washing in, 55 

Jaqai people, 212 

Jarmo, early farming site at, 338 

Jate people, 212 

Joint stock companies, 239 

Jomon culture, 358-9 

Kalahari, 262, 272, 423 
Kalenjin language group, 217 
Kapauku people, 212 
Kenya, 130 
Kikuyu people, 130 
Kin selection, 105, 146, 189, 257 
example of robust sample model, 410 
Kiwai people, 212 
Kukukuku people, 212 
Kuma people, 212 
Kuria people, 329 
Kuru disease, 216 

Lago Grande de Monticcio core, 343 
Lamarckian inheritance (and evolution) 14, 
290, 400, 409 
Language 

examples of coherence of small cultural 
units, 328 

an index of cultural phylogeny, 320, 324 
linguistic diversity as adaptive barriers to 
communication, 271-2 
role in indirect reciprocity, 275 
Western North American Indians, 217 
Language-technology coevolution, 333 
Last glacial climate, 340-4 
Leadership, 266-7, 269 
Learning, individual, 19, 66, 391-2 
accuracy and cost determines value of 
social learning, 28, 32, 43 
costly, 35 

social learning multiplies power of, 70 
versus social learning and innate 
programming, 75-7, 86 
Learning, social. See Social learning 
Legislature, and group-functional behavior, 
221 

Legitimate institutions, 269-70 
Little Ice Age, 344 
Lizard, 54, 425 

Local enhancement defined, 55, 426-7. See 
also Imitation; Observational learning 
Local population as source of valuable 
information, 100 
Logistic equation, 351 

Ma’a language, 328 

Macaque, 105 

Macarthur Foundation, 249 

Machiavellian intelligence, 273 

Mae Enga people, 209, 219 

Mahale Mountains, chimpanzees in, 426 

Mailu people, 212 
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Maize, 356, 394 
Maladaptations, 9-11, 18 

conformist transmission stabilizes, 192 
maintained by moralistic punishment, 

177 

origin by runaway evolution of symbolic 
systems, 116 

predictable byproduct of cultural 
transmission, 10, 395, 400 
Maladaptations in social arrangements 
result of abuses of elite power, 266 
result of failures of hierarchical 
bureaucracies, 267 

Maladaptive consequences of workarounds 
coercive dominance, 266 
difficulty in creating and maintaining 
trust, 269-70 
segmentary hierarchy, 267 
symbolically marked groups, 268 
Maria , concept of, 217 
Mander cultural area, 216 
Marind-Anim people, 212, 217 
Mariner’s compass, 242 
Maring people, 209-10, 212 
Marker trait, 107 
Markets, 223, 269 
Mass media, 268 

Mathematical models. See also Models 
needed to study population level 
phenomena, 105, 433 
simple models versus complex 
phenomena, 376-7 
Mayans, 325 
Meaning, 320 

Mediterranean climate change in, 345 

Melpa people, 212 

Memes 

critique of, 377, 434 
as mind viruses, 323 
Mendi people, 210 
Mental representations, 57 
Mesoamerica, spread of agriculture in, 356 
Mesolithic societies, 348, 360 
Mexico, 344, 356, 360 
Migration, 141 

and phylogeny, 314 

Wright island and stepping-stone models 
of, 142 

Military, 265, 267 

Millennial and sub-millennial scale climate 
variation, 67-8, 340-4 
Millingstones, 359 
Mind design, 74, 79 
Models. See also Simple models 
of basic population level processes, 

383-4 

of Bayesian decision-maker with access to 
traditional information, 381-93 
of biased imitation, 388-9 
as caricatures, 404 
complex critiqued, 377, 402-3 
continuous versus discrete, 57 


of cooperation and punishment with 
conformity and group selection, 
192-202 

of cultural evolution, 105-6 
of cultural recombination at group level, 
236-8 

of cumulative learning, 49-50 
of dispersal, 352-3 

of dynamics of innovation, 353-5, 363-4 
of evolutionarily stable amount of 
tradition, 384-8 

of evolution of conformity, 89-90 
of evolution of ethnic markers, 106-14, 
119-28 

of evolution of reciprocal cooperation, 
146-60, 162-3 

of evolution of reciprocity with 
retribution (punishment) 170-7, 
179-86 

of evolution of social learning, 56-64 
of fast form of group selection, 91-5 
of gene-culture coevolution, 199-200 
of heterogeneous environments, 386-8 
of how group beneficial equilibria spread, 
231-6 

of individual and social learning, 21-32 
of learning and imitation by Rogers, 36-7 
of learning and imitation in variable 
environment, 45-9 

of natural selection (on cultural variation) 
389-91 

of population dynamics, 363 
of population pressure with diffusion and 
innovation, 351-5 

of replicator dynamics in a structured 
population, 229-38 
of simulation of evolution of altruistic 
punishment, 243-8 
strategy for addressing controversial 
questions, 412 

tradeoffs between generality, realism, and 
accuracy in constructing, 405 
utility of simple, 403-4 
verbal unreliable, 377, 405 
Modern societies versus small-scale 
societies, 264-5 

Modular organization of cognition, 69-70, 
73-4 

Monkey, 55 

Moralistic punishment, 91, 134, 138-40, 
167, 248 

stabilizes anything, 176-9 
Moral systems, 84 
Mormon norms, 84 
Multilevel nature of evolution, 256-8 
Multiple stable equilibria, 261. See also Folk 
theorem 
Myth, 217 

NaDene peoples, 325 
Naidjbeedj cultural area, 216 
Naked mole rat, 189 
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Natufian culture, 348, 350-1, 357, 359 
Natural selection 

abstract category, 255 
acting on cultural variation, 10, 259, 289, 
392, 400 

conditions to favor increased reliance on 
social learning, 20-33 
generally favors small brains, 68 
shapes learning rules adaptively, 20 
Nature-nurture dichotomy, 8 
controversy confused, 413 
Navaho people, 324 
Neanderthals, 272 
Near East, 338-9, 356-8 
Neo-Darwinian synthesis, 315 
Neolithic societies, 348, 357, 360 
New group formation, as important to 
cultural group selection, 211-3 
New Guinea, 143, 205, 217-8, 261, 267, 
323, 330, 356 
South Coast, 217 

New World, isolation of cultures in, 334 
New Zealand, population build-up in, 359 
Nilotic peoples, 328-9 
Non-human animals, culture of, 425-7 
Non-parental transmission fitness costs and 
benefits, 401 

Non-stationary time series, 291-2 
Norms, 119 

Athapaskan as an example of cultural 
persistence, 321 
definition, 84 

functional versus dysfunctional, 228 
group beneficial, 227-8, 238 
help people make good decisions cheaply, 
83-4 

persistence explained, 227-8 
North America, spread of agriculture in, 338 
North China, spread of agriculture in, 338, 
356 

Northwest Europe, 356 
Nuer people, 221, 261, 273 
Numic languages, 333 

Observational learning. See also Imitation 
critical to cumulative cultural evolution 
in humans, 427 
defined, 54-5, 426-7 
limited to humans, 55 
requires special psychological 
mechanisms, 56 
Octopus, 52 

Ohalo II archaeological site, 358 
Ok people, 212, 213-4, 323-4 
Orangutan, 56 

Paleolithic societies, 262 
Palio (horse race of Siena), 267 
Papago people, 425 
Parrot, 79 

Path dependence, 292 

Pavlov reciprocating strategy, 136 


Phenotypic flexibility, 67 
Phylogenetic reconstruction in biology 
classification, 310 
detection of constraints, 311 
inferences about history, 310 
Phylogenetic reconstruction of cultures. 

See also Descent in cultural evolution 
comparison of core and small unit 
hypotheses, 330 
core traditions: evidence, 322-6 
cultures as wholes: evidence for, 322 
partial phylogenies and the study of 
adaptation, 332-3 
why important, 332 
Pigeon, 52, 70 

Plant intensive subsistence systems, 345-6 
Plausibility arguments versus conventional 
hypothesis testing, 410-2 
human sociobiology as example of, 411 
versus programmatic attacks, 413 
Pleistocene climates, 16, 74, 143 
climate seasonality, 365 
deterioration of, 67-8 
hunter-gathers under, 345 
millennial and sub-millennial scale 
variation, 340-4 
role in cognitive evolution, 66 
role in deterring agriculture, 338, 339-44 
Pleistocene epoch, 67-8, 354-5, 362 
Poland, Solidarity movement in, 145 
Police, 265-6 
Pollen record, 345 
Polynesia, 217, 325, 359 
ranked lineage system, 266 
Popperian falsificationism improper 
epistemology for ecology and 
evolution, 411 
Population growth 

has wrong time scale to explain origins of 
agriculture, 351 

limited by growth of subsistence, 354 
Population level properties 

and complex cultural traditions, 16 
of culture, 8-10 

implications for cognitive psychology, 

432 

linkage to individual level, 110 
necessary to explain rates of historical 
change, 221 

similar in the cases of genes and culture, 
105, 289, 378 
of social learning, 20 
Population pressure, 285 
Population thinking 

Darwin’s most fundamental contribution, 
420 

necessary to understand culture, 421 
Pottery, 359 

Power, abuse of, 264-70 
Preferences, 254-5 
Prestige, 273 

Prestige and charisma, 267 
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Prestige systems, 15. See also Biased 

transmission, success or prestige based 
Price’s covariance equation, 141 
Progressive evolution, 66, 284 
Property rights, 269 
Protestantism, 329 
Protoculture, 425-7 
Proximate versus ultimate causes, 9-10, 
256, 259 
Public goods, 85 
Punishment, 84, 189 
altruistic, 169, 241-9 
cooperation favored by, 246-8 
selection within groups against weak, 243 
stabilizes a wide variety of behaviors, 129, 
139-40 

stabilizes norms, 91 
Purari people, 217 
Puritans, 91 

Raiapu Enga people, 212 
Random forces, 289, 400 
Rat, 20, 52, 70, 426 
black versus Norway, 77 
Rational actor model (rational choice 
theory) 5-6, 84, 190, 238, 252, 273 
critique of, 379-80 

incomplete without theory of tradition, 
391, 393 

second generation bounded, 254 
Rational planning, 223 
Rational self-interest, as failing to explain 
experimental data, 145 
Reciprocity, 134, 135-41, 189 
effects of kin selection in pairs versus 
larger groups, 159-60 
evolution in large groups, 137-41, 146, 
148-60, 168-9 

evolution in pairs, 147-9, 166, 167-8 
evolution when groups form assortatively, 
158-60 

in large groups rare in nature, 162 
limitations of model, 161-2 
Reductionism, 106, 377, 408-9 
simple models only tactical, 408 
Religion, 84, 217, 265, 329 
established, 267 
proselytizing, 239 
Replicators 

not necessary for cumulative adaptive 
evolution, 430-2 
properties of, 429 
Retribution, definition, 167 
Risk of crop yields, 344 
Robust models, 404 
Rock climbing, 327 
Roman legal system, 266 
Rules of thumb, 107. See also Heuristics 

Salish people, 330 

Sample theories, 404-6 

Santa Barbara Basin core, 341-3 


Scientific controversies, 414 
Scientific laws, 283 

Scientific versus historical explanation, 283, 
288, 290-1, 303-6 
cannot be disentangled as separate 
enterprises, 304 
dichotomy false, 29 
Scrub jay, 105 

Second (and higher) order cooperation, 140, 
169, 175-9, 189-90 
Segar cultural area, 216 
Segmentary hierarchy, 266-7 
Self-control, norms solve problem of, 84, 85 
Self-interest 

choices result in group-beneficial 
behavior, 222-3 
versus group function, 205 
Self-justifying ideologies, 268 
Settlement size, 262 
Shoshone people, 262-3 
Northern, 263, 273 
Show-off hypothesis, 275 
Siena, tribal social instincts in, 267 
Sierra Leonean Creoles, as adopting 
Freemasonry, 329 

Simple models, 18. See also Models 
choice of problem, 407 
critique of, 397 
empirical tests of, 406 
fitness optimization example of, 398 
limitations, 107 

modularization of analysis, 407-8 
often more useful than complex models, 
403-4 

only tactically reductionism, 408 
of social learning, 20 
strategy of construction, 406-13 
toolkit of as theory, 283 
Simple versus complex adaptive 
topographies, 295 
Skinner box, 70 
Slavery, 268 

Small-scale societies, 261. See also 
Hunter-gatherers 

Social enhancement, definition of, 44 
Social intelligence, 273 
Social intelligence hypothesis, 78 
Social learning, 66. See also Cultural 
transmission; Culture; Imitation 
adaptation to variable environments, 70-2 
adaptive function of, 20 
comparative study of chimpanzees and 
children, 77 

component of general purpose learning 
system, 71 
definition, 20 

does not necessarily increase mean fitness, 
35 

model of, 13, 15 

in non-human animals, 53-6, 76-8 
simple systems common, 16 
students of, 107 
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versus individual learning and innate 
programming, 75-7, 86 
with multiple models, 29-32 
Social sciences 
critique of, 375-6 

declaration of independence from biology, 

413- 4 

Sociobiology. See Human sociobiology 
Sociolinguistics, 270 

Sociology of science of Donald Campbell, 

414- 5 

Songbirds, 20. See also Birdsong dialects 
Sonoita, “Devil’s Road’’ at, 425 
South America, spread of agriculture in, 356 
South China, spread of agriculture in, 356 
Southern Africa climate record, 343 
Southwest (U.S.) 324, 356, 360 
Spain, 262, 360 
Sparrows, 425 

Spatially structured population necessary for 
origin of ethnic markers, 124 
Spatially variable environments, 75 
and Rogers’s model of learning and 
imitation, 36-7 
Speciation, 115 
Strategic modeling, 74-5, 79 
Sub-Saharan Africa, 270, 356 
Success based biased transmission. See Biased 
transmission, success or prestige based 
Sudan, cultural group selection in, 261 
Sun Dance, 329 

associated with buffalo hunting, 333 
Symbolic culture, 99 
force for coherence, 320 
language-like productivity of, 268 
Symbolically marked groups. See also 
Ethnicity; Ethnic markers 
boundaries, 18, 320 

in complex societies, diversity of, 267-8 

and cooperation, 272 

cultural substitute for speciation, 115 

of diverse types, 265 

origins of, 99-100 

testing model of, 113-4 

Tabu (taboo) 217, 223 
Tasmanian effect, 272 
Technology, 272 
Terik people, 328 
Terrorist organizations, 268-9 
Theory, as toolkit of models, 376-7, 397-8, 
404 

Thousand year rule of Lumsden and Wilson, 
407 

Tibet, 325 

Tierra del Fuego, 272 
Tiriki people, 328 
Tit-for-tat, 136 

Toolkit of models as theory, 376-7, 408 
Tor peoples, 211, 212, 216-7 
Tradition. See also Culture 
acts like a system of inheritance, 394 


sometimes reliable source of information, 
379 

strong reliance on sensible, 394 
Tribal social instincts hypothesis, 260-4 
Tribal societies, tendency to intertribal 
anarchy, 267 

Tribal-scale institutions, 273 

science as tribal scale enterprises, 415 
Tribe 

definition, 262 
institutions of, 262-4 
Trust, 269, 273. See also Cooperation 
Tupi speakers, 325 

Ultimate causes of cooperative behavior, 
256-7 

versus proximate causes, 9-10, 256 
Understanding versus predicting, 377 
Upper Paleolithic (late Pleistocene) 
societies, 262. See also Small-scale 
societies 

contrasted with contemporary societies, 
264-5 

essentially modern social instincts, 264 
Upper Paleolithic period, 272, 357-8, 360 
Usufura people, 210 

Vampire bats, 189 

Variable environments. See also Agriculture, 
origins of; Environmental variability; 
Millennial and sub-millennial scale 
variation; Pleistocene climates 
Brains’ adaptations to, 73 
cognitive complexity, 66 
Plio-Pleistocene, 67-8 
role in favoring imitation, 43 
spatially, 107 

Variation. See Cultural variation; Genetic 
variation 

Venezuela climate change in, 343 
Verbal reasoning, as unreliable, 377, 433 

Waf people, 216 
Walloon language group, 217 
Western North America, persistence of 
hunting and gathering in, 339 
Western North America climate records, 343 
White American Southerners, 268 
White-crowned sparrow, 426 
“Why possibly’’ explanations in 
evolutionary biology, 305 
Within versus between group models of 
equilibrium selection, 228-9 
Wola people, 212 
Work-around hypothesis, 264-70 
coercive dominance as, 265-6 
legitimate institutions as, 269-70 
segmentary hierarchy as, 266-7 
symbolically marked subgroups as, 267-9 

Younger Dryas, 343, 348, 350 
Yuma, “Devil’s Road’’ at, 425 



